Michal Valko

Affiliations:
  • DeepMind


According to our database1, Michal Valko authored at least 140 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Optimal Design for Reward Modeling in RLHF.
CoRR, 2024

Preference Optimization with Multi-Sample Comparisons.
CoRR, 2024

A New Bound on the Cumulant Generating Function of Dirichlet Processes.
CoRR, 2024

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving.
CoRR, 2024

Understanding the performance gap between online and offline alignment algorithms.
CoRR, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.
CoRR, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


Decoding-time Realignment of Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Demonstration-Regularized RL.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unlocking the Power of Representations in Long-term Novelty-based Exploration.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

A General Theoretical Paradigm to Understand Learning from Human Preferences.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Nash Learning from Human Feedback.
CoRR, 2023

A General Theoretical Paradigm to Understand Learning from Human Preferences.
CoRR, 2023

Local and adaptive mirror descents in extensive-form games.
CoRR, 2023

Model-free Posterior Sampling via Learning Rate Randomization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast Rates for Maximum Entropy Exploration.
Proceedings of the International Conference on Machine Learning, 2023

VA-learning as a more efficient alternative to Q-learning.
Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.
Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.
Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
Proceedings of the International Conference on Machine Learning, 2023

Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments.
Proceedings of the International Conference on Machine Learning, 2023

Adapting to game trees in zero-sum imperfect information games.
Proceedings of the International Conference on Machine Learning, 2023

Half-Hop: A graph upsampling approach for slowing down message passing.
Proceedings of the International Conference on Machine Learning, 2023

2022
Curiosity in hindsight.
CoRR, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.
CoRR, 2022

Retrieval-Augmented Reinforcement Learning.
CoRR, 2022

Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.
Proceedings of the International Conference on Machine Learning, 2022


Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times.
Proceedings of the International Conference on Machine Learning, 2022

Large-Scale Representation Learning on Graphs via Bootstrapping.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Adaptive Multi-Goal Exploration.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Marginalized Operators for Off-policy Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Fast sampling from β-ensembles.
Stat. Comput., 2021

Game Plan: What AI can do for Football, and What Football can do for AI.
J. Artif. Intell. Res., 2021

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall.
CoRR, 2021

Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction.
CoRR, 2021

Bootstrapped Representation Learning on Graphs.
CoRR, 2021

Geometric Entropic Exploration.
CoRR, 2021

On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions.
CoRR, 2021

Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Provably Efficient Sample Collection Strategy for Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning in two-player zero-sum partially observable Markov games with perfect recall.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.
Proceedings of the 38th International Conference on Machine Learning, 2021

UCB Momentum Q-learning: Correcting the bias without forgetting.
Proceedings of the 38th International Conference on Machine Learning, 2021

Fast active learning for pure exploration in reinforcement learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Online A-Optimal Design and Active Linear Regression.
Proceedings of the 38th International Conference on Machine Learning, 2021

Kernel-Based Reinforcement Learning: A Finite-Time Analysis.
Proceedings of the 38th International Conference on Machine Learning, 2021

Broaden Your Views for Self-Supervised Video Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model.
Proceedings of the Algorithmic Learning Theory, 2021

Adaptive Reward-Free Exploration.
Proceedings of the Algorithmic Learning Theory, 2021

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited.
Proceedings of the Algorithmic Learning Theory, 2021

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
BYOL works even without batch statistics.
CoRR, 2020

Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards.
CoRR, 2020

Regret Bounds for Kernel-Based Reinforcement Learning.
CoRR, 2020

Improved Sample Complexity for Incremental Autonomous Exploration in MDPs.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Sampling from a k-DPP without looking at all items.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

No-Regret Exploration in Goal-Oriented Reinforcement Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

Taylor Expansion Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards.
Proceedings of the 37th International Conference on Machine Learning, 2020

Budgeted Online Influence Maximization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Stochastic bandits with arm-dependent delays.
Proceedings of the 37th International Conference on Machine Learning, 2020

Monte-Carlo Tree Search as Regularized Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Gamification of Pure Exploration for Linear Bandits.
Proceedings of the 37th International Conference on Machine Learning, 2020

Near-linear time Gaussian process optimization with adaptive batching and resparsification.
Proceedings of the 37th International Conference on Machine Learning, 2020

Covariance-adapting algorithm for semi-bandits with application to sparse outcomes.
Proceedings of the Conference on Learning Theory, 2020

Fixed-confidence guarantees for Bayesian best-arm identification.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

A single algorithm for both restless and rested rotting bandits.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Adaptive multi-fidelity optimization with fast learning rates.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Derivative-Free & Order-Robust Optimisation.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
DPPy: DPP Sampling with Python.
J. Mach. Learn. Res., 2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits.
CoRR, 2019

Multiagent Evaluation under Incomplete Information.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Planning in entropy-regularized Markov decision processes and games.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On two ways to use determinantal point processes for Monte Carlo integration.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exact sampling of determinantal point processes with sublinear time preprocessing.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exploiting structure of uncertainty for efficient matroid semi-bandits.
Proceedings of the 36th International Conference on Machine Learning, 2019

Scale-free adaptive planning for deterministic dynamics & discounted rewards.
Proceedings of the 36th International Conference on Machine Learning, 2019

Gaussian Process Optimization with Adaptive Sketching: Scalable and No Regret.
Proceedings of the Conference on Learning Theory, 2019

General parallel optimization a without metric.
Proceedings of the Algorithmic Learning Theory, 2019

A simple parameter-free and adaptive approach to optimization under a minimal local smoothness assumption.
Proceedings of the Algorithmic Learning Theory, 2019

Rotting bandits are no harder than stochastic ones.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Finding the bandit in a graph: Sequential search-and-stop.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Active multiple matrix completion with adaptive confidence sets.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
DPPy: Sampling Determinantal Point Processes with Python.
CoRR, 2018

Optimistic optimization of a Brownian.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Preface.
Proceedings of the 14th International Conference on Intelligent Tutoring Systems 2018 Workshops, 2018

Improved Large-Scale Graph Learning through Ridge Spectral Sparsification.
Proceedings of the 35th International Conference on Machine Learning, 2018

Compressing the Input for CNNs with the First-Order Scattering Transform.
Proceedings of the Computer Vision - ECCV 2018, 2018

Best of both worlds: Stochastic & adversarial best-arm identification.
Proceedings of the Conference On Learning Theory, 2018

2017
Online Influence Maximization under Independent Cascade Model with Semi-Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Efficient Second-Order Online Kernel Learning with Adaptive Embedding.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Zonotope Hit-and-run for Efficient Sampling from Projection DPPs.
Proceedings of the 34th International Conference on Machine Learning, 2017

Second-Order Kernel Online Convex Optimization with Adaptive Sketching.
Proceedings of the 34th International Conference on Machine Learning, 2017

Trading off Rewards and Errors in Multi-Armed Bandits.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Distributed Adaptive Sampling for Kernel Matrix Approximation.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Bayesian Policy Gradient and Actor-Critic Algorithms.
J. Mach. Learn. Res., 2016

Influence Maximization with Semi-Bandit Feedback.
CoRR, 2016

Incremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning.
CoRR, 2016

Analysis of Kelner and Levin graph sparsification algorithm for a streaming setting.
CoRR, 2016

Online learning with Erdos-Renyi side-observation graphs.
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Analysis of Nyström method with sequential ridge leverage scores.
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Pliable Rejection Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Online Learning with Noisy Side Observations.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Revealing Graph Bandits for Maximizing Local Influence.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

2015
Black-box optimization of noisy functions with unknown smoothness.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Maximum Entropy Semi-Supervised Inverse Reinforcement Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Cheap Bandits.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Simple regret for infinitely many armed bandits.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Learning to Act Greedily: Polymatroid Semi-Bandits.
CoRR, 2014

Online combinatorial optimization with stochastic decision sets and adversarial losses.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Efficient learning by implicit exploration in bandit problems with side observations.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Extreme bandits.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Spectral Bandits for Smooth Graph Functions.
Proceedings of the 31th International Conference on Machine Learning, 2014

Bandits attack function optimization.
Proceedings of the IEEE Congress on Evolutionary Computation, 2014

Spectral Thompson Sampling.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
Outlier detection for patient monitoring and alerting.
J. Biomed. Informatics, 2013

Finite-Time Analysis of Kernelised Contextual Bandits.
Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence, 2013

Stochastic Simultaneous Optimistic Optimization.
Proceedings of the 30th International Conference on Machine Learning, 2013

Learning from a single labeled face and a stream of unlabeled data.
Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2013

2012
Semi-Supervised Apprenticeship Learning.
Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

2011
Conditional Anomaly Detection with Soft Harmonic Functions.
Proceedings of the 11th IEEE International Conference on Data Mining, 2011

2010
Semi-Supervised Learning with Max-Margin Graph Cuts.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Online Semi-Supervised Learning on Quantized Graphs.
Proceedings of the UAI 2010, 2010

Feature importance analysis for patient management decisions.
Proceedings of the MEDINFO 2010, 2010

Online semi-supervised perception: Real-time learning without explicit feedback.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010

2008
Distance Metric Learning for Conditional Anomaly Detection.
Proceedings of the Twenty-First International Florida Artificial Intelligence Research Society Conference, 2008

2007
Evidence-based Anomaly Detection in Clinical Domains.
Proceedings of the AMIA 2007, 2007


  Loading...