Mark Rowland

J. Mach. Learn. Res., 2024

Foundations of Multivariate Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[BibT_eX]

[DOI]

CoRR, 2024

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model.

[BibT_eX]

[DOI]

CoRR, 2024

Off-policy Distributional Q(λ): Distributional RL without Importance Sampling.

[BibT_eX]

[DOI]

CoRR, 2024

A Distributional Analogue to the Successor Representation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Distributional Bellman Operators over Mean Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[BibT_eX]

[DOI]

Michal Valko

Bernardo Ávila Pires

Bilal Piot

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Michal Valko

Daniele Calandriello

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Michal Valko

Daniele Calandriello

CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[BibT_eX]

[DOI]

CoRR, 2023

VA-learning as a more efficient alternative to Q-learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

Yunhao Tang

Zhaohan Daniel Guo

Proceedings of the International Conference on Machine Learning, 2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Bootstrapped Representations in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

A Novel Stochastic Gradient Descent Algorithm for Learning Principal Subspaces.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning".

[BibT_eX]

[DOI]

Dataset, October, 2022

Evolutionary Dynamics and Phi-Regret Minimization in Games.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2022

Learning Correlated Equilibria in Mean-Field Games.

[BibT_eX]

[DOI]

CoRR, 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Dynamics and Generalization in Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Generalised Policy Improvement with Geometric Policy Composition.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Learning Dynamics and Generalization in Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Understanding and Preventing Capacity Loss in Reinforcement Learning.

[BibT_eX]

[DOI]

Clare Lyle

Will Dabney

Proceedings of the Tenth International Conference on Learning Representations, 2022

Learning Equilibria in Mean-Field Games: Introducing Mean-Field PSRO.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022

Marginalized Operators for Off-policy Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Game Plan: What AI can do for Football, and What Football can do for AI.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2021

Evolutionary Dynamics and Φ-Regret Minimization in Games.

[BibT_eX]

[DOI]

CoRR, 2021

MICo: Learning improved representations via sampling-based state similarity for Markov decision processes.

[BibT_eX]

[DOI]

CoRR, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

MICo: Improved representations via sampling-based state similarity for Markov decision processes.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization.

[BibT_eX]

[DOI]

Julien Pérolat

Jean-Baptiste Lespiau

Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

On the Effect of Auxiliary Tasks on Representation Dynamics.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

The Value-Improvement Path: Towards Better Representations for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Navigating the Landscape of Games.

[BibT_eX]

[DOI]

Shayegan Omidshafiei

Karl Tuyls

Wojciech M. Czarnecki

CoRR, 2020

Fast computation of Nash Equilibria in Imperfect Information Games.

[BibT_eX]

[DOI]

Julien Pérolat

Jean-Baptiste Lespiau

Edward Lockhart

Karl Tuyls

Proceedings of the 37th International Conference on Machine Learning, 2020

Revisiting Fundamentals of Experience Replay.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

A Generalized Training Approach for Multiagent Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Conditional Importance Sampling for Off-Policy Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Adaptive Trade-Offs in Off-Policy Learning.

[BibT_eX]

[DOI]

Will Dabney

Christos H. Papadimitriou

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

Antithetic and Monte Carlo kernel estimators for partial rankings.

[BibT_eX]

[DOI]

Stat. Comput., 2019

Meta-learning of Sequential Strategies.

[BibT_eX]

[DOI]

CoRR, 2019

α-Rank: Multi-Agent Evaluation by Evolution.

[BibT_eX]

[DOI]

Shayegan Omidshafiei

Georgios Piliouras

Karl Tuyls

Jean-Baptiste Lespiau

Wojciech M. Czarnecki

Marc Lanctot

Julien Pérolat

CoRR, 2019

Multiagent Evaluation under Incomplete Information.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Statistics and Samples in Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Unifying Orthogonal Monte Carlo Methods.

[BibT_eX]

[DOI]

Wenyu Chen

Proceedings of the 36th International Conference on Machine Learning, 2019

Orthogonal Estimation of Wasserstein Distances.

[BibT_eX]

[DOI]

Jiri Hron

Yunhao Tang

Tamás Sarlós

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Geometrically Coupled Monte Carlo Sampling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Structured Evolution with Compact Architectures for Scalable Policy Optimization.

[BibT_eX]

[DOI]

Alexander G. de G. Matthews

Proceedings of the 35th International Conference on Machine Learning, 2018

Gaussian Process Behaviour in Wide Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

An Analysis of Categorical Distributional Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

The Geometry of Random Features.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Distributional Reinforcement Learning With Quantile Regression.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Uprooting and Rerooting Higher-Order Graphical Models.

[BibT_eX]

[DOI]

Krzysztof Marcin Choromanski

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Magnetic Hamiltonian Monte Carlo.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Conditions beyond treewidth for tightness of higher-order LP relaxations.

[BibT_eX]

[DOI]

Aldo Pacchiano

José Miguel Hernández-Lobato

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016

Black-Box Alpha Divergence Minimization.

[BibT_eX]

[DOI]

Yingzhen Li

Thang D. Bui

Daniel Hernández-Lobato

Richard E. Turner

Proceedings of the 33nd International Conference on Machine Learning, 2016

Tightness of LP Relaxations for Almost Balanced Models.

[BibT_eX]

[DOI]