Rémi Munos

According to our database1, Rémi Munos authored at least 225 papers between 1996 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
An Analysis of Quantile Temporal-Difference Learning.
J. Mach. Learn. Res., 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.
CoRR, 2024

Multi-turn Reinforcement Learning from Preference Human Feedback.
CoRR, 2024

Understanding the performance gap between online and offline alignment algorithms.
CoRR, 2024

Super-Exponential Regret for UCT, AlphaGo and Variants.
CoRR, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.
CoRR, 2024

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model.
CoRR, 2024

Off-policy Distributional Q(λ): Distributional RL without Importance Sampling.
CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


Human Alignment of Large Language Models through Online Preference Optimisation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Nash Learning from Human Feedback.
CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.
CoRR, 2023

Local and adaptive mirror descents in extensive-form games.
CoRR, 2023

Model-free Posterior Sampling via Learning Rate Randomization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast Rates for Maximum Entropy Exploration.
Proceedings of the International Conference on Machine Learning, 2023

VA-learning as a more efficient alternative to Q-learning.
Proceedings of the International Conference on Machine Learning, 2023

Towards a better understanding of representation dynamics under TD-learning.
Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.
Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation.
Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.
Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
Proceedings of the International Conference on Machine Learning, 2023

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments.
Proceedings of the International Conference on Machine Learning, 2023

Adapting to game trees in zero-sum imperfect information games.
Proceedings of the International Conference on Machine Learning, 2023

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition.
Proceedings of the International Conference on Machine Learning, 2023

2022

Curiosity in hindsight.
CoRR, 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.
CoRR, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.
CoRR, 2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Generalised Policy Improvement with Geometric Policy Composition.
Proceedings of the International Conference on Machine Learning, 2022

Large-Scale Representation Learning on Graphs via Bootstrapping.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Concave Utility Reinforcement Learning: The Mean-field Game Viewpoint.
Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022

Marginalized Operators for Off-policy Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Concentration bounds for temporal difference learning with linear function approximation: the case of batch data and uniform sampling.
Mach. Learn., 2021

Game Plan: What AI can do for Football, and What Football can do for AI.
J. Artif. Intell. Res., 2021

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall.
CoRR, 2021

Bootstrapped Representation Learning on Graphs.
CoRR, 2021

Geometric Entropic Exploration.
CoRR, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning in two-player zero-sum partially observable Markov games with perfect recall.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.
Proceedings of the 38th International Conference on Machine Learning, 2021

From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Counterfactual Credit Assignment in Model-Free Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
A distributional code for value in dopamine-based reinforcement learning.
Nat., 2020

Counterfactual Credit Assignment in Model-Free Reinforcement Learning.
CoRR, 2020

The Advantage Regret-Matching Actor-Critic.
CoRR, 2020

Navigating the Landscape of Games.
CoRR, 2020

Leverage the Average: an Analysis of Regularization in RL.
CoRR, 2020

Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Taylor Expansion Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Fast computation of Nash Equilibria in Imperfect Information Games.
Proceedings of the 37th International Conference on Machine Learning, 2020

Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Monte-Carlo Tree Search as Regularized Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Generalized Training Approach for Multiagent Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020

Conditional Importance Sampling for Off-Policy Learning.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Adaptive Trade-Offs in Off-Policy Learning.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Neural Replicator Dynamics.
CoRR, 2019

α-Rank: Multi-Agent Evaluation by Evolution.
CoRR, 2019

World Discovery Models.
CoRR, 2019

Multiagent Evaluation under Incomplete Information.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Hindsight Credit Assignment.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Planning in entropy-regularized Markov decision processes and games.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Statistics and Samples in Distributional Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

Recurrent Experience Replay in Distributed Reinforcement Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Universal Successor Features Approximators.
Proceedings of the 7th International Conference on Learning Representations, 2019

Observational Learning by Reinforcement Learning.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

The Termination Critic.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Optimistic planning with an adaptive number of action switches for near-optimal nonlinear control.
Eng. Appl. Artif. Intell., 2018

Neural Predictive Belief Representations.
CoRR, 2018

Observe and Look Further: Achieving Consistent Performance on Atari.
CoRR, 2018

Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery.
CoRR, 2018

A Study on Overfitting in Deep Reinforcement Learning.
CoRR, 2018

Continuous-action planning for discounted infinite-horizon nonlinear optimal control with Lipschitz values.
Autom., 2018

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Optimistic optimization of a Brownian.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Autoregressive Quantile Networks for Generative Modeling.
Proceedings of the 35th International Conference on Machine Learning, 2018

The Uncertainty Bellman Equation and Exploration.
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning to Search with MCTSnets.
Proceedings of the 35th International Conference on Machine Learning, 2018

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures.
Proceedings of the 35th International Conference on Machine Learning, 2018

Implicit Quantile Networks for Distributional Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement.
Proceedings of the 35th International Conference on Machine Learning, 2018

The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

Noisy Networks For Exploration.
Proceedings of the 6th International Conference on Learning Representations, 2018

Maximum a Posteriori Policy Optimisation.
Proceedings of the 6th International Conference on Learning Representations, 2018

An Analysis of Categorical Distributional Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

Distributional Reinforcement Learning With Quantile Regression.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
The Reactor: A Sample-Efficient Actor-Critic Architecture.
CoRR, 2017

Noisy Networks for Exploration.
CoRR, 2017

Observational Learning by Reinforcement Learning.
CoRR, 2017

The Cramer Distance as a Solution to Biased Wasserstein Gradients.
CoRR, 2017

Successor Features for Transfer in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Count-Based Exploration with Neural Density Models.
Proceedings of the 34th International Conference on Machine Learning, 2017

Automated Curriculum Learning for Neural Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

A Distributional Perspective on Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Minimax Regret Bounds for Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Combining policy gradient and Q-learning.
Proceedings of the 5th International Conference on Learning Representations, 2017

Sample Efficient Actor-Critic with Experience Replay.
Proceedings of the 5th International Conference on Learning Representations, 2017

Learning to reinforcement learn.
Proceedings of the 39th Annual Meeting of the Cognitive Science Society, 2017

2016
Guest Editors' foreword.
Theor. Comput. Sci., 2016

Analysis of Classification-based Policy Iteration Algorithms.
J. Mach. Learn. Res., 2016

PGQ: Combining policy gradient and Q-learning.
CoRR, 2016

Successor Features for Transfer in Reinforcement Learning.
CoRR, 2016

Safe and Efficient Off-Policy Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Memory-Efficient Backpropagation Through Time.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Unifying Count-Based Exploration and Intrinsic Motivation.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Discounted near-optimal control of general continuous-action nonlinear systems using optimistic planning.
Proceedings of the 2016 American Control Conference, 2016

Q(λ) with Off-Policy Corrections.
Proceedings of the Algorithmic Learning Theory - 27th International Conference, 2016

Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Increasing the Action Gap: New Operators for Reinforcement Learning.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Adaptive strategy for stratified Monte Carlo sampling.
J. Mach. Learn. Res., 2015

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits.
CoRR, 2015

Black-box optimization of noisy functions with unknown smoothness.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Cheap Bandits.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Toward Minimax Off-policy Value Estimation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Fast Gradient Descent for Drifting Least Squares Regression, with Application to Bandits.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Regret bounds for restless Markov bandits.
Theor. Comput. Sci., 2014

Minimax number of strata for online stratified sampling: The case of noisy samples.
Theor. Comput. Sci., 2014

From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning.
Found. Trends Mach. Learn., 2014

On Minimax Optimal Offline Policy Evaluation.
CoRR, 2014

Relative confidence sampling for efficient on-line ranker evaluation.
Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, 2014

Fast LSTD Using Stochastic Approximation: Finite Time Analysis and Application to Traffic Control.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

Optimistic Planning in Markov Decision Processes Using a Generative Model.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Best-Arm Identification in Linear Bandits.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Active Regression by Stratification.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Bounded Regret for Finite-Armed Structured Bandits.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Efficient learning by implicit exploration in bandit problems with side observations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem.
Proceedings of the 31th International Conference on Machine Learning, 2014

Spectral Bandits for Smooth Graph Functions.
Proceedings of the 31th International Conference on Machine Learning, 2014

Bandits attack function optimization.
Proceedings of the IEEE Congress on Evolutionary Computation, 2014

Optimistic planning with a limited number of action switches for near-optimal nonlinear control.
Proceedings of the 53rd IEEE Conference on Decision and Control, 2014

An analysis of optimistic, best-first search for minimax sequential decision making.
Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

Spectral Thompson Sampling.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model.
Mach. Learn., 2013

Analysis of stochastic approximation for efficient least squares regression and LSTD.
CoRR, 2013

Online gradient descent for least squares regression: Non-asymptotic bounds and application to bandits.
CoRR, 2013

Finite-Time Analysis of Kernelised Contextual Bandits.
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

Thompson Sampling for 1-Dimensional Exponential Family Bandits.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Aggregating Optimistic Planning Trees for Solving Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Stochastic Simultaneous Optimistic Optimization.
Proceedings of the 30th International Conference on Machine Learning, 2013

Toward Optimal Stratification for Stratified Monte-Carlo Integration.
Proceedings of the 30th International Conference on Machine Learning, 2013

Editors' Introduction.
Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

Optimistic planning for belief-augmented Markov Decision Processes.
Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

Optimistic planning for continuous-action deterministic systems.
Proceedings of the 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2013

2012
Linear regression with random projections.
J. Mach. Learn. Res., 2012

Finite-sample analysis of least-squares policy iteration.
J. Mach. Learn. Res., 2012

Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Optimistic planning for Markov decision processes.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Learning with stochastic inputs and adversarial outputs.
J. Comput. Syst. Sci., 2012

Thompson Sampling: An Optimal Finite Time Analysis
CoRR, 2012

Risk-Aversion in Multi-armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Bandit Algorithms boost Brain Computer Interfaces for motor-task selection of a brain-controlled button.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Adaptive Stratified Sampling for Monte-Carlo integration of Differentiable functions.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

On the Sample Complexity of Reinforcement Learning with a Generative Model .
Proceedings of the 29th International Conference on Machine Learning, 2012

Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Minimax Number of Strata for Online Stratified Sampling Given Noisy Samples.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Least-Squares Methods for Policy Iteration.
Proceedings of the Reinforcement Learning, 2012

2011
Pure exploration in finitely-armed and continuous-armed bandits.
Theor. Comput. Sci., 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.
Proceedings of the COLT 2011, 2011

Adaptive Bandits: Towards the best history-dependent strategy.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

<i>X</i>-Armed Bandits.
J. Mach. Learn. Res., 2011

Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Selecting the State-Representation in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sparse Recovery with Brownian Sensing.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite Time Analysis of Stratified Sampling for Monte Carlo.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Speedy Q-Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Finite-Sample Analysis of Lasso-TD.
Proceedings of the 28th International Conference on Machine Learning, 2011

Regularized Least Squares Temporal Difference Learning with Nested ℓ2 and ℓ1 Penalization.
Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits.
Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

Optimistic planning for sparsely stochastic systems.
Proceedings of the 2011 IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning, 2011

2010
Finite-sample Analysis of Bellman Residual Minimization.
Proceedings of the 2nd Asian Conference on Machine Learning, 2010

X-Armed Bandits
CoRR, 2010

Online Learning in Adversarial Lipschitz Environments.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2010

Scrambled Objects for Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

LSTD with Random Projections.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Error Propagation for Approximate Policy and Value Iteration.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Finite-Sample Analysis of LSTD.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Analysis of a Classification-based Policy Iteration Algorithm.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Open Loop Optimistic Planning.
Proceedings of the COLT 2010, 2010

Best Arm Identification in Multi-Armed Bandits.
Proceedings of the COLT 2010, 2010

2009
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.
Theor. Comput. Sci., 2009

Compressed Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Sensitivity analysis in HMMs with application to likelihood maximization.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Workshop summary: On-line learning with limited feedback.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Hybrid Stochastic-Adversarial On-line Learning.
Proceedings of the COLT 2009, 2009

Pure Exploration in Multi-armed Bandits Problems.
Proceedings of the Algorithmic Learning Theory, 20th International Conference, 2009

2008
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.
Mach. Learn., 2008

Finite-Time Bounds for Fitted Value Iteration.
J. Mach. Learn. Res., 2008

Pure Exploration for Multi-Armed Bandit Problems
CoRR, 2008

Algorithms for Infinitely Many-Armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Particle Filter-based Policy Gradient in POMDPs.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Online Optimization in X-Armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Optimistic Planning of Deterministic Systems.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Adaptive play in Texas Hold'em Poker.
Proceedings of the ECAI 2008, 2008

2007
Performance Bounds in L<sub>p</sub>-norm for Approximate Value Iteration.
SIAM J. Control. Optim., 2007

Analyse en norme Lp de l'algorithme d'itérations sur les valeurs avec approximations.
Rev. d'Intelligence Artif., 2007

Bandit Algorithms for Tree Search.
Proceedings of the UAI 2007, 2007

Fitted Q-iteration in continuous action-space MDPs.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Tuning Bandit Algorithms in Stochastic Environments.
Proceedings of the Algorithmic Learning Theory, 18th International Conference, 2007

2006
Policy Gradient in Continuous Time.
J. Mach. Learn. Res., 2006

Geometric Variance Reduction in Markov Chains: Application to Value Function and Gradient Estimation.
J. Mach. Learn. Res., 2006

2005
Sensitivity Analysis Using It[o-circumflex]--Malliavin Calculus and Martingales, and Application to Stochastic Optimal Control.
SIAM J. Control. Optim., 2005

Finite time bounds for sampling based fitted value iteration.
Proceedings of the Machine Learning, 2005

Error Bounds for Approximate Value Iteration.
Proceedings of the Proceedings, 2005

2003
Error Bounds for Approximate Policy Iteration.
Proceedings of the Machine Learning, 2003

2002
Variable Resolution Discretization in Optimal Control.
Mach. Learn., 2002

2001
Efficient Resources Allocation for Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

2000
A Study of Reinforcement Learning in the Continuous Case by the Means of Viscosity Solutions.
Mach. Learn., 2000

Rates of Convergence for Variable Resolution Schemes in Optimal Control.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

1999
Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation.
Proceedings of the International Joint Conference Neural Networks, 1999

Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems.
Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, 1999

1998
Barycentric Interpolators for Continuous Space and Time Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

A General Convergence Method for Reinforcement Learning in the Continuous Case.
Proceedings of the Machine Learning: ECML-98, 1998

1997
Reinforcement Learning for Continuous Stochastic Control Problems.
Proceedings of the Advances in Neural Information Processing Systems 10, 1997

A Convergent Reinforcement Learning Algorithm in the Continuous Case Based on a Finite Difference Method.
Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, 1997

Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems.
Proceedings of the Machine Learning: ECML-97, 1997

1996
A Convergent Reinforcement Learning Algorithm in the Continuous Case: The Finite-Element Reinforcement Learning.
Proceedings of the Machine Learning, 1996


  Loading...