Michal Valko

CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.

[BibT_eX]

[DOI]

Bernardo Ávila Pires

Bilal Piot

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Decoding-time Realignment of Language Models.

[BibT_eX]

[DOI]

Felipe Llinares-López

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Demonstration-Regularized RL.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unlocking the Power of Representations in Long-term Novelty-based Exploration.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Nash Learning from Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.

[BibT_eX]

[DOI]

CoRR, 2023

Local and adaptive mirror descents in extensive-form games.

[BibT_eX]

[DOI]

CoRR, 2023

Model-free Posterior Sampling via Learning Rate Randomization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast Rates for Maximum Entropy Exploration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

VA-learning as a more efficient alternative to Q-learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.

[BibT_eX]

[DOI]

Yunhao Tang

Zhaohan Daniel Guo

Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Adapting to game trees in zero-sum imperfect information games.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Half-Hop: A graph upsampling approach for slowing down message passing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

Curiosity in hindsight.

[BibT_eX]

[DOI]

CoRR, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.

[BibT_eX]

[DOI]

CoRR, 2022

Retrieval-Augmented Reinforcement Learning.

[BibT_eX]

[DOI]

Adrià Puigdomènech Badia

CoRR, 2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.

[BibT_eX]

[DOI]

Bilal Piot

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Retrieval-Augmented Reinforcement Learning.

[BibT_eX]

[DOI]

Adrià Puigdomènech Badia

Arthur Guez

Mehdi Mirza

Peter Conway Humphreys

Proceedings of the International Conference on Machine Learning, 2022

Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Large-Scale Representation Learning on Graphs via Bootstrapping.

[BibT_eX]

[DOI]

Shantanu Thakoor

Corentin Tallec

Proceedings of the Tenth International Conference on Learning Representations, 2022

Adaptive Multi-Goal Exploration.

[BibT_eX]

[DOI]

Jean Tarbouriech

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Marginalized Operators for Off-policy Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Fast sampling from β-ensembles.

[BibT_eX]

[DOI]

Stat. Comput., 2021

Game Plan: What AI can do for Football, and What Football can do for AI.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2021

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall.

[BibT_eX]

[DOI]

CoRR, 2021

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction.

[BibT_eX]

[DOI]

Mehdi Azabou

William R. Gray Roncal

Eva L. Dyer

CoRR, 2021

Bootstrapped Representation Learning on Graphs.

[BibT_eX]

[DOI]

Shantanu Thakoor

Corentin Tallec

Petar Velickovic

CoRR, 2021

Geometric Entropic Exploration.

[BibT_eX]

[DOI]

Zhaohan Daniel Guo

CoRR, 2021

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions.

[BibT_eX]

[DOI]

CoRR, 2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Provably Efficient Sample Collection Strategy for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity.

[BibT_eX]

[DOI]

Keith B. Hengen

Eva L. Dyer

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning in two-player zero-sum partially observable Markov games with perfect recall.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

UCB Momentum Q-learning: Correcting the bias without forgetting.

[BibT_eX]

[DOI]

Xuedong Shang

Proceedings of the 38th International Conference on Machine Learning, 2021

Fast active learning for pure exploration in reinforcement learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Online A-Optimal Design and Active Linear Regression.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Kernel-Based Reinforcement Learning: A Finite-Time Analysis.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Broaden Your Views for Self-Supervised Video Learning.

[BibT_eX]

[DOI]

Adrià Recasens

Pauline Luc

Jean-Baptiste Alayrac

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 2021

Adaptive Reward-Free Exploration.

[BibT_eX]

[DOI]

Anders Jonsson

Edouard Leurent

Proceedings of the Algorithmic Learning Theory, 2021

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 2021

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020

BYOL works even without batch statistics.

[BibT_eX]

[DOI]

CoRR, 2020

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards.

[BibT_eX]

[DOI]

Aadirupa Saha

Pierre Gaillard

CoRR, 2020

Regret Bounds for Kernel-Based Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.

[BibT_eX]

[DOI]

Anders Jonsson

Edouard Leurent

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sampling from a k-DPP without looking at all items.

[BibT_eX]

[DOI]

Michal Derezinski

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

No-Regret Exploration in Goal-Oriented Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Taylor Expansion Policy Optimization.

[BibT_eX]

[DOI]

Yunhao Tang

Proceedings of the 37th International Conference on Machine Learning, 2020

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards.

[BibT_eX]

[DOI]

Aadirupa Saha

Pierre Gaillard

Proceedings of the 37th International Conference on Machine Learning, 2020

Budgeted Online Influence Maximization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Stochastic bandits with arm-dependent delays.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Monte-Carlo Tree Search as Regularized Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Gamification of Pure Exploration for Linear Bandits.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Near-linear time Gaussian process optimization with adaptive batching and resparsification.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Covariance-adapting algorithm for semi-bandits with application to sparse outcomes.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

Fixed-confidence guarantees for Bayesian best-arm identification.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

A single algorithm for both restless and rested rotting bandits.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Adaptive multi-fidelity optimization with fast learning rates.

[BibT_eX]

[DOI]

Côme Fiegel

Victor Gabillon

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Derivative-Free & Order-Robust Optimisation.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019

DPPy: DPP Sampling with Python.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits.

[BibT_eX]

[DOI]

CoRR, 2019

Multiagent Evaluation under Incomplete Information.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Planning in entropy-regularized Markov decision processes and games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On two ways to use determinantal point processes for Monte Carlo integration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exact sampling of determinantal point processes with sublinear time preprocessing.

[BibT_eX]

[DOI]

Michal Derezinski

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exploiting structure of uncertainty for efficient matroid semi-bandits.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Scale-free adaptive planning for deterministic dynamics & discounted rewards.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

General parallel optimization a without metric.

[BibT_eX]

[DOI]

Xuedong Shang

Proceedings of the Algorithmic Learning Theory, 2019

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption.

[BibT_eX]

[DOI]

Peter L. Bartlett

Victor Gabillon

Proceedings of the Algorithmic Learning Theory, 2019

Rotting bandits are no harder than stochastic ones.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Finding the bandit in a graph: Sequential search-and-stop.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Active multiple matrix completion with adaptive confidence sets.

[BibT_eX]

[DOI]

Andrea Locatelli

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

DPPy: Sampling Determinantal Point Processes with Python.

[BibT_eX]

[DOI]

CoRR, 2018

Optimistic optimization of a Brownian.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Preface.

[BibT_eX]

[DOI]

Fabrice Popineau

Jill-Jênn Vie

Proceedings of the 14th International Conference on Intelligent Tutoring Systems 2018 Workshops, 2018

Improved Large-Scale Graph Learning through Ridge Spectral Sparsification.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Compressing the Input for CNNs with the First-Order Scattering Transform.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Best of both worlds: Stochastic & adversarial best-arm identification.

[BibT_eX]

[DOI]

Proceedings of the Conference On Learning Theory, 2018

2017

Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Efficient Second-Order Online Kernel Learning with Adaptive Embedding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Zonotope Hit-and-run for Efficient Sampling from Projection DPPs.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Second-Order Kernel Online Convex Optimization with Adaptive Sketching.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Trading off Rewards and Errors in Multi-Armed Bandits.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Distributed Adaptive Sampling for Kernel Matrix Approximation.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016

Bayesian Policy Gradient and Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Mohammad Ghavamzadeh

Yaakov Engel

J. Mach. Learn. Res., 2016

Influence Maximization with Semi-Bandit Feedback.

[BibT_eX]

[DOI]

Zheng Wen

Branislav Kveton

CoRR, 2016

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2016

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting.

[BibT_eX]

[DOI]

CoRR, 2016

Online learning with Erdos-Renyi side-observation graphs.

[BibT_eX]

[DOI]

Tomás Kocák

Gergely Neu

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Analysis of Nyström method with sequential ridge leverage scores.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Pliable Rejection Sampling.

[BibT_eX]

[DOI]

Akram Erraqabi

Odalric-Ambrym Maillard

Proceedings of the 33nd International Conference on Machine Learning, 2016

Online Learning with Noisy Side Observations.

[BibT_eX]

[DOI]

Tomás Kocák

Gergely Neu

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Revealing Graph Bandits for Maximizing Local Influence.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015

Black-box optimization of noisy functions with unknown smoothness.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Maximum Entropy Semi-Supervised Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Cheap Bandits.

[BibT_eX]

[DOI]

Manjesh Kumar Hanawal

Venkatesh Saligrama

Proceedings of the 32nd International Conference on Machine Learning, 2015

Simple regret for infinitely many armed bandits.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

2014

Learning to Act Greedily: Polymatroid Semi-Bandits.

[BibT_eX]

[DOI]

CoRR, 2014

Online combinatorial optimization with stochastic decision sets and adversarial losses.

[BibT_eX]

[DOI]

Gergely Neu

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Efficient learning by implicit exploration in bandit problems with side observations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Extreme bandits.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Spectral Bandits for Smooth Graph Functions.

[BibT_eX]

[DOI]

Proceedings of the 31th International Conference on Machine Learning, 2014

Bandits attack function optimization.

[BibT_eX]

[DOI]

Philippe Preux

Proceedings of the IEEE Congress on Evolutionary Computation, 2014

Spectral Thompson Sampling.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013

Outlier detection for patient monitoring and alerting.

[BibT_eX]

[DOI]

J. Biomed. Informatics, 2013

Finite-Time Analysis of Kernelised Contextual Bandits.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

Stochastic Simultaneous Optimistic Optimization.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Machine Learning, 2013

Learning from a single labeled face and a stream of unlabeled data.

[BibT_eX]

[DOI]

Branislav Kveton

Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2013

2012

Semi-Supervised Apprenticeship Learning.

[BibT_eX]

[DOI]

Mohammad Ghavamzadeh

Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

2011

Conditional Anomaly Detection with Soft Harmonic Functions.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on Data Mining, 2011

2010

Semi-Supervised Learning with Max-Margin Graph Cuts.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Online Semi-Supervised Learning on Quantized Graphs.

[BibT_eX]

[DOI]

Proceedings of the UAI 2010, 2010

Feature importance analysis for patient management decisions.

[BibT_eX]

[DOI]

Milos Hauskrecht

Proceedings of the MEDINFO 2010, 2010

Online semi-supervised perception: Real-time learning without explicit feedback.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010

2008

Distance Metric Learning for Conditional Anomaly Detection.

[BibT_eX]

[DOI]