Odalric-Ambrym Maillard

Affiliations:
  • Technion, Haifa, Faculty of Electrical Engineering


According to our database1, Odalric-Ambrym Maillard authored at least 84 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithms.
Trans. Mach. Learn. Res., 2024

How to Shrink Confidence Sets for Many Equivalent Discrete Distributions?
CoRR, 2024

Power Mean Estimation in Stochastic Monte-Carlo Tree_Search.
CoRR, 2024

Bandits with Multimodal Structure.
Proceedings of the 1st Reinforcement Learning Conference, 2024

Évaluation de critères de sélection de noyaux pour la régression Ridge à noyau dans un contexte de petits jeux de données.
Proceedings of the Extraction et Gestion des Connaissances, 2024

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption.
Proceedings of the International Conference on Algorithmic Learning Theory, 2024

2023
Monte-Carlo tree search with uncertainty propagation via optimal transport.
CoRR, 2023

AdaStop: sequential testing for efficient and reliable comparisons of Deep RL Agents.
CoRR, 2023

Fast Asymptotically Optimal Algorithms for Non-Parametric Stochastic Bandits.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Bregman Deviations of Generic Exponential Families.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Risk-aware linear bandits with convex loss.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Exploration in Reward Machines with Low Regret.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

Logarithmic regret in communicating MDPs: Leveraging known dynamics with bandits.
Proceedings of the Asian Conference on Machine Learning, 2023

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration & Planning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Collaborative Algorithms for Online Personalized Mean Estimation.
Trans. Mach. Learn. Res., 2022

Efficient Change-Point Detection for Tackling Piecewise-Stationary Bandits.
J. Mach. Learn. Res., 2022

Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration and Planning.
CoRR, 2022

gym-DSSAT: a crop model turned into a Reinforcement Learning environment.
CoRR, 2022

Bandits Corrupted by Nature: Lower Bounds on Regret and Robust Optimistic Algorithm.
CoRR, 2022

A channel selection game for multi-operator LoRaWAN deployments.
Comput. Networks, 2022

Reinforcement learning for crop management support: Review, prospects and challenges.
Comput. Electron. Agric., 2022

IMED-RL: Regret optimal learning of ergodic Markov decision processes.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
From Optimality to Robustness: Dirichlet Sampling Strategies in Stochastic Bandits.
CoRR, 2021

Routine Bandits: Minimizing Regret on Recurring Problems.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2021

Indexed Minimum Empirical Divergence for Unimodal Bandits.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Stochastic bandits with groups of similar arms.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Stochastic Online Linear Regression: the Forward Algorithm to Replace Ridge.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

From Optimality to Robustness: Adaptive Re-Sampling Strategies in Stochastic Bandits.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Optimal Thompson Sampling strategies for support-aware CVaR bandits.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning Value Functions in Deep Policy Gradients using Residual Variance.
Proceedings of the 9th International Conference on Learning Representations, 2021

Improved Exploration in Factored Average-Reward MDPs.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Reinforcement Learning in Parametric MDPs with Exponential Families.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Thompson Sampling for CVaR Bandits.
CoRR, 2020

Is Standard Deviation the New Standard? Revisiting the Critic in Deep Policy Gradients.
CoRR, 2020

Optimal Strategies for Graph-Structured Bandits.
CoRR, 2020

Forced-exploration free Strategies for Unimodal Bandits.
CoRR, 2020

Robust Estimation, Prediction and Control with Linear Dynamics and Generic Costs.
CoRR, 2020

Robust-Adaptive Control of Linear Systems: beyond Quadratic Costs.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sub-sampling for Efficient Non-Parametric Bandit Exploration.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Tightening Exploration in Upper Confidence Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Restarted Bayesian Online Change-point Detector achieves Optimal Detection Delay.
Proceedings of the 37th International Conference on Machine Learning, 2020

Robust-Adaptive Interval Predictive Control for Linear Uncertain Systems.
Proceedings of the 59th IEEE Conference on Decision and Control, 2020

Monte-Carlo Graph Search: the Value of Merging Similar States.
Proceedings of The 12th Asian Conference on Machine Learning, 2020

2019
Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits.
CoRR, 2019

Scaling up budgeted reinforcement learning.
CoRR, 2019

Approximate Robust Control of Uncertain Dynamical Systems.
CoRR, 2019

Practical Open-Loop Optimistic Planning.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2019

Learning Multiple Markov Chains via Adaptive Allocation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Regret Bounds for Learning State Representations in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Budgeted Reinforcement Learning in Continuous State Space.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds.
Proceedings of the Algorithmic Learning Theory, 2019

Model-Based Reinforcement Learning Exploiting State-Action Equivalence.
Proceedings of The 11th Asian Conference on Machine Learning, 2019

Mathematics of Statistical Sequential Decision Making. (Mathématique de la prise de décision séquentielle statistique).
, 2019

2018
Streaming kernel regression with provably adaptive mean, variance, and regularization.
J. Mach. Learn. Res., 2018

Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs.
Proceedings of the Algorithmic Learning Theory, 2018

2017
The non-stationary stochastic multi-armed bandit problem.
Int. J. Data Sci. Anal., 2017

Spectral Learning from a Single Trajectory under Finite-State Policies.
Proceedings of the 34th International Conference on Machine Learning, 2017

Efficient tracking of a growing number of experts.
Proceedings of the International Conference on Algorithmic Learning Theory, 2017

Boundary Crossing for General Exponential Families.
Proceedings of the International Conference on Algorithmic Learning Theory, 2017

2016
Low-rank Bandits with Latent Mixtures.
CoRR, 2016

Random Shuffling and Resets for the Non-stationary Stochastic Bandit Problem.
CoRR, 2016

Pliable Rejection Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2014
Sub-sampling for Multi-armed Bandits.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2014

How hard is my MDP?" The distribution-norm to the rescue".
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Latent Bandits.
Proceedings of the 31th International Conference on Machine Learning, 2014

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning.
Proceedings of the Algorithmic Learning Theory - 25th International Conference, 2014

2013
Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning.
Proceedings of the 30th International Conference on Machine Learning, 2013

Robust Risk-Averse Stochastic Multi-armed Bandits.
Proceedings of the Algorithmic Learning Theory - 24th International Conference, 2013

Competing with an Infinite Set of Models in Reinforcement Learning.
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, 2013

2012
Linear regression with random projections.
J. Mach. Learn. Res., 2012

Hierarchical Optimistic Region Selection driven by Curiosity.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Online allocation and homogeneous partitioning for piecewise constant mean-approximation.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2011
(APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement.
PhD thesis, 2011

A Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences.
Proceedings of the COLT 2011, 2011

Adaptive Bandits: Towards the best history-dependent strategy.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Selecting the State-Representation in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sparse Recovery with Brownian Sensing.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

2010
Finite-sample Analysis of Bellman Residual Minimization.
Proceedings of the 2nd Asian Conference on Machine Learning, 2010

Online Learning in Adversarial Lipschitz Environments.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2010

Scrambled Objects for Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

LSTD with Random Projections.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

2009
Compressed Least-Squares Regression.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Complexity versus Agreement for Many Views.
Proceedings of the Algorithmic Learning Theory, 20th International Conference, 2009


  Loading...