Alekh Agarwal

Orcid: 0000-0001-7032-7162

According to our database1, Alekh Agarwal authored at least 130 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Model-Free Representation Learning and Exploration in Low-Rank MDPs.
J. Mach. Learn. Res., 2024

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning.
CoRR, 2024

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning.
CoRR, 2024

Robust Preference Optimization through Reward Model Distillation.
CoRR, 2024

Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization.
CoRR, 2024

Theoretical guarantees on the best-of-n alignment policy.
CoRR, 2024

Efficient End-to-End Visual Document Understanding with Rationale Distillation.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

More Benefits of Being Distributional: Second-Order Bounds for Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

A Minimaximalist Approach to Reinforcement Learning from Human Feedback.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Non-linear F-Design and Applications to Interactive Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks.
Proceedings of the International Conference on Algorithmic Learning Theory, 2024

2023
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking.
CoRR, 2023

Peer Reviews of Peer Reviews: A Randomized Controlled Trial and Other Experiments.
CoRR, 2023

An Empirical Evaluation of Federated Contextual Bandit Algorithms.
CoRR, 2023

Leveraging User-Triggered Supervision in Contextual Bandits.
CoRR, 2023

Ordering-based Conditions for Global Convergence of Policy Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Stochastic Gradient Succeeds for Bandits.
Proceedings of the International Conference on Machine Learning, 2023

Learning in POMDPs is Sample-Efficient with Hindsight Observability.
Proceedings of the International Conference on Machine Learning, 2023

VOQL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Provable Benefits of Representational Transfer in Reinforcement Learning.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning approach.
Proceedings of the International Conference on Machine Learning, 2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Minimax Regret Optimization for Robust Machine Learning under Distribution Shift.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021
A Contextual Bandit Bake-off.
J. Mach. Learn. Res., 2021

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift.
J. Mach. Learn. Res., 2021

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics.
CoRR, 2021

Bellman-consistent Pessimism for Offline Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Correct Optimization and Exploration with Non-linear Policies.
Proceedings of the 38th International Conference on Machine Learning, 2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation.
Proceedings of the Conference on Learning Theory, 2021

Towards a Dimension-Free Understanding of Adaptive Linear Control.
Proceedings of the Conference on Learning Theory, 2021

2020
Provably Good Batch Reinforcement Learning Without Great Exploration.
CoRR, 2020

Policy Improvement from Multiple Experts.
CoRR, 2020

Optimizing Interactive Systems via Data-Driven Objectives.
CoRR, 2020

Reparameterized Variational Divergence Minimization for Stable Imitation.
CoRR, 2020

Federated Residual Learning.
CoRR, 2020

Safe Reinforcement Learning via Curriculum Induction.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Policy Improvement via Imitation of Multiple Oracles.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds.
Proceedings of the 8th International Conference on Learning Representations, 2020

Taking a hint: How to leverage loss predictors in contextual bandits?
Proceedings of the Conference on Learning Theory, 2020

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal.
Proceedings of the Conference on Learning Theory, 2020

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes.
Proceedings of the Conference on Learning Theory, 2020

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Active Learning for Cost-Sensitive Classification.
J. Mach. Learn. Res., 2019

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes.
CoRR, 2019

Off-Policy Policy Gradient with State Distribution Correction.
CoRR, 2019

Off-Policy Policy Gradient with Stationary Distribution Correction.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback.
Proceedings of the 36th International Conference on Machine Learning, 2019

Provably efficient RL with Rich Observations via Latent State Decoding.
Proceedings of the 36th International Conference on Machine Learning, 2019

Fair Regression: Quantitative Definitions and Reduction-Based Algorithms.
Proceedings of the 36th International Conference on Machine Learning, 2019

Bias Correction of Learned Generative Models via Likelihood-free Importance Weighting.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches.
Proceedings of the Conference on Learning Theory, 2019

2018
Model-Based Reinforcement Learning in Contextual Decision Processes.
CoRR, 2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations.
CoRR, 2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms.
CoRR, 2018

On Oracle-Efficient PAC RL with Rich Observations.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Practical Contextual Bandits with Regression Oracles.
Proceedings of the 35th International Conference on Machine Learning, 2018

A Reductions Approach to Fair Classification.
Proceedings of the 35th International Conference on Machine Learning, 2018

Hierarchical Imitation and Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Efficient Contextual Bandits in Non-stationary Worlds.
Proceedings of the Conference On Learning Theory, 2018

Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon.
Proceedings of the Conference On Learning Theory, 2018

2017
A Clustering Approach to Learning Sparsely Used Overcomplete Dictionaries.
IEEE Trans. Inf. Theory, 2017

Efficient Contextual Bandits in Non-stationary Worlds.
CoRR, 2017

Off-policy evaluation for slate recommendation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits.
Proceedings of the 34th International Conference on Machine Learning, 2017

Contextual Decision Processes with low Bellman rank are PAC-Learnable.
Proceedings of the 34th International Conference on Machine Learning, 2017

Corralling a Band of Bandit Algorithms.
Proceedings of the 30th Conference on Learning Theory, 2017

Open Problem: First-Order Regret Bounds for Contextual Bandits.
Proceedings of the 30th Conference on Learning Theory, 2017

2016
Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization.
SIAM J. Optim., 2016

Efficient Second Order Online Learning via Sketching.
CoRR, 2016

Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations.
CoRR, 2016

A Multiworld Testing Decision Service.
CoRR, 2016

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains.
CoRR, 2016

Efficient Second Order Online Learning by Sketching.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

PAC Reinforcement Learning with Rich Observations.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Contextual semibandits via supervised learning oracles.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015
Efficient Contextual Semi-Bandit Learning.
CoRR, 2015

Fast Convergence of Regularized Learning in Games.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Efficient and Parsimonious Agnostic Active Learning.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Learning to Search Better than Your Teacher.
Proceedings of the 32nd International Conference on Machine Learning, 2015

A Lower Bound for the Optimization of Finite Sums.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
A reliable effective terascale linear learning system.
J. Mach. Learn. Res., 2014

Scalable Nonlinear Learning with Adaptive Polynomial Expansions.
CoRR, 2014

Scalable Non-linear Learning with Adaptive Polynomial Expansions.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Least Squares Revisited: Scalable Approaches for Multi-class Prediction.
Proceedings of the 31th International Conference on Machine Learning, 2014

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits.
Proceedings of the 31th International Conference on Machine Learning, 2014

Robust Multi-objective Learning with Mentor Feedback.
Proceedings of The 27th Conference on Learning Theory, 2014

Learning Sparsely Used Overcomplete Dictionaries.
Proceedings of The 27th Conference on Learning Theory, 2014

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions.
Proceedings of the 48th Annual Conference on Information Sciences and Systems, 2014

2013
The Generalization Ability of Online Algorithms for Dependent Data.
IEEE Trans. Inf. Theory, 2013

Stochastic Convex Optimization with Bandit Feedback.
SIAM J. Optim., 2013

Para-active learning.
CoRR, 2013

Exact Recovery of Sparsely Used Overcomplete Dictionaries.
CoRR, 2013

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization.
CoRR, 2013

Selective sampling algorithms for cost-sensitive multiclass prediction.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Computational Trade-offs in Statistical Learning.
PhD thesis, 2012

Information-Theoretic Lower Bounds on the Oracle Complexity of Stochastic Convex Optimization.
IEEE Trans. Inf. Theory, 2012

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling.
IEEE Trans. Autom. Control., 2012

Ergodic Mirror Descent.
SIAM J. Optim., 2012

Contextual Bandit Learning with Predictable Rewards.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Oracle inequalities for computationally adaptive model selection
CoRR, 2012

FASt global convergence of gradient methods for solving regularized M-estimation.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Dual averaging for distributed optimization.
Proceedings of the 50th Annual Allerton Conference on Communication, 2012

2011
Oracle inequalities for computationally budgeted model selection.
Proceedings of the COLT 2011, 2011

Fast global convergence of gradient methods for high-dimensional statistical recovery
CoRR, 2011

Online and Batch Learning Algorithms for Data with Missing Features
CoRR, 2011

Learning with Missing Features.
Proceedings of the UAI 2011, 2011

Distributed Delayed Stochastic Optimization.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions.
Proceedings of the 28th International Conference on Machine Learning, 2011

2010
Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes.
J. Mach. Learn. Res., 2010

Optimal Allocation Strategies for the Dark Pool Problem.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Distributed Dual Averaging In Networks.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Fast global convergence rates of gradient methods for high-dimensional statistical recovery.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback.
Proceedings of the COLT 2010, 2010

2009
Information-theoretic lower bounds on the oracle complexity of convex optimization.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

A Stochastic View of Optimal Regret through Minimax Duality.
Proceedings of the COLT 2009, 2009

2008
Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes.
Proceedings of the Machine Learning, 2008

2007
An Analysis of Inference with the Universum.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Learning random walks to rank nodes in graphs.
Proceedings of the Machine Learning, 2007

2006
Learning Parameters in Entity Relationship Graphs from Ranking Preferences.
Proceedings of the Knowledge Discovery in Databases: PKDD 2006, 2006

Learning to rank networked entities.
Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006


  Loading...