Csaba Szepesvári

Orcid: 0000-0002-9286-2892

Affiliations:
  • University of Alberta


According to our database1, Csaba Szepesvári authored at least 306 papers between 1993 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Almost Free: Self-concordance in Natural Exponential Families and an Application to Bandits.
CoRR, 2024

Confident Natural Policy Gradient for Local Planning in q<sub>π</sub>-realizable Constrained MDPs.
CoRR, 2024

To Believe or Not to Believe Your LLM.
CoRR, 2024

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear q<sup>π</sup>-Realizability and Concentrability.
CoRR, 2024

Mitigating LLM Hallucinations via Conformal Abstention.
CoRR, 2024

Switching the Loss Reduces the Cost in Batch Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Stochastic Gradient Descent for Gaussian Processes Done Right.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Exploration via linearly perturbed loss minimisation.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
Ensemble sampling for linear bandits: small ensembles suffice.
CoRR, 2023

Sample Efficient Deep Reinforcement Learning via Local Planning.
CoRR, 2023

Optimistic MLE: A Generic Model-Based Algorithm for Partially Observable Sequential Decision Making.
Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023

Online RL in Linearly q<sup>π</sup>-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Ordering-based Conditions for Global Convergence of Policy Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Context-lumpable stochastic bandits.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Regret Minimization via Saddle Point Optimization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Simple Regret: Fast Rates for Returning a Good Arm.
Proceedings of the International Conference on Machine Learning, 2023

Stochastic Gradient Succeeds for Bandits.
Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
Proceedings of the International Conference on Machine Learning, 2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation.
Proceedings of the International Conference on Machine Learning, 2023

Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022
Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks.
CoRR, 2022

Revisiting Simple Regret Minimization in Multi-Armed Bandits.
CoRR, 2022

Confident Approximate Policy Iteration for Efficient Local Planning in q<sup>π</sup>-realizable MDPs.
CoRR, 2022

KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.
CoRR, 2022

Towards Painless Policy Optimization for Constrained MDPs.
CoRR, 2022

A free lunch from the noise: Provable and practical exploration for representation learning.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Confident Approximate Policy Iteration for Efficient Local Planning in $q^\pi$-realizable MDPs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Near-Optimal Sample Complexity Bounds for Constrained MDPs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Role of Baselines in Policy Gradient Optimization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

When Is Partially Observable Reinforcement Learning Not Scary?
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Efficient local planning with linear function approximation.
Proceedings of the International Conference on Algorithmic Learning Theory, 29 March, 2022

TensorPlan and the Few Actions Lower Bound for Planning in MDPs under Linear Realizability of Optimal Value Functions.
Proceedings of the International Conference on Algorithmic Learning Theory, 29 March, 2022

The Curse of Passive Data Collection in Batch Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Faster Rates, Adaptive Algorithms, and Finite-Time Bounds for Linear Composition Optimization and Gradient TD Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

Confident Least Square Value Iteration with Local Access to a Simulator.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Guest editorial: special issue on reinforcement learning for real life.
Mach. Learn., 2021

Tighter Risk Certificates for Neural Networks.
J. Mach. Learn. Res., 2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs.
CoRR, 2021

On the Sample Complexity of Batch Reinforcement Learning with Policy-Induced Data.
CoRR, 2021

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning.
CoRR, 2021

Optimization Issues in KL-Constrained Approximate Policy Iteration.
CoRR, 2021

Bootstrapping Statistical Inference for Off-Policy Evaluation.
CoRR, 2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Understanding the Effect of Stochasticity in Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Role of Optimization in Double Descent: A Least Squares Study.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

No Regrets for Learning the Prior in Bandits.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Optimality of Batch Policy Optimization Algorithms.
Proceedings of the 38th International Conference on Machine Learning, 2021

Leveraging Non-uniformity in First-order Non-convex Optimization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Improved Regret Bound and Experience Replay in Regularized Policy Iteration.
Proceedings of the 38th International Conference on Machine Learning, 2021

Meta-Thompson Sampling.
Proceedings of the 38th International Conference on Machine Learning, 2021

A Distribution-dependent Analysis of Meta Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference.
Proceedings of the 38th International Conference on Machine Learning, 2021

Sparse Feature Selection Makes Batch Reinforcement Learning More Sample Efficient.
Proceedings of the 38th International Conference on Machine Learning, 2021

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes.
Proceedings of the Conference on Learning Theory, 2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function.
Proceedings of the Conference on Learning Theory, 2021

Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping.
Proceedings of the Conference on Learning Theory, 2021

Asymptotically Optimal Information-Directed Sampling.
Proceedings of the Conference on Learning Theory, 2021

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions.
Proceedings of the Algorithmic Learning Theory, 2021

Confident Off-Policy Evaluation and Selection through Self-Normalized Importance Weighting.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Online Sparse Reinforcement Learning.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Adaptive Approximate Policy Iteration.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
A modular analysis of adaptive (non-)convex optimization: Optimism, composite objectives, variance reduction, and variational bounds.
Theor. Comput. Sci., 2020

Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers.
J. Mach. Learn. Res., 2020

On Optimality of Meta-Learning in Fixed-Design Regression with Weighted Biased Regularization.
CoRR, 2020

Differentiable Meta-Learning in Contextual Bandits.
CoRR, 2020

Differentiable Bandit Exploration.
CoRR, 2020

Provably Efficient Adaptive Approximate Policy Iteration.
CoRR, 2020

Bounds and dynamics for empirical game theoretic analysis.
Auton. Agents Multi Agent Syst., 2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

ImpatientCapsAndRuns: Approximately Optimal Algorithm Configuration from an Infinite Pool.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Online Algorithm for Unsupervised Sequential Selection with Contextual Information.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Efficient Planning in Large MDPs with Weak Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

PAC-Bayes Analysis Beyond the Usual Bounds.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model Selection in Contextual Stochastic Bandit Problems.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Escaping the Gravitational Pull of Softmax.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

CoinDICE: Off-Policy Confidence Interval Estimation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Differentiable Meta-Learning of Bandit Policies.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model-Based Reinforcement Learning with Value-Targeted Regression.
Proceedings of the 2nd Annual Conference on Learning for Dynamics and Control, 2020

On the Global Convergence Rates of Softmax Policy Gradient Methods.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning with Good Feature Representations in Bandits and in RL with a Generative Model.
Proceedings of the 37th International Conference on Machine Learning, 2020

A simpler approach to accelerated optimization: iterative averaging meets optimism.
Proceedings of the 37th International Conference on Machine Learning, 2020

Model-Based Reinforcement Learning with Value-Targeted Regression.
Proceedings of the 37th International Conference on Machine Learning, 2020

Behaviour Suite for Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Exploration by Optimisation in Partial Monitoring.
Proceedings of the Conference on Learning Theory, 2020

Randomized Exploration in Generalized Linear Bandits.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Adaptive Exploration in Linear Contextual Bandit.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Learning with Good Feature Representations in Bandits and in RL with a Generative Model.
CoRR, 2019

Autonomous exploration for navigating in non-stationary CMPs.
CoRR, 2019

Efron-Stein PAC-Bayesian Inequalities.
CoRR, 2019

Exploration-Enhanced POLITEX.
CoRR, 2019

PAC-Bayes with Backprop.
CoRR, 2019

Empirical Bayes Regret Minimization.
CoRR, 2019

An Exponential Efron-Stein Inequality for Lq Stable Learning Rules.
CoRR, 2019

Perturbed-History Exploration in Stochastic Linear Bandits.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback.
Proceedings of the Thirty-Fifth Conference on Uncertainty in Artificial Intelligence, 2019

Detecting Overfitting via Adversarial Examples.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Perturbed-History Exploration in Stochastic Multi-Armed Bandits.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

CapsAndRuns: An Improved Method for Approximately Optimal Algorithm Configuration.
Proceedings of the 36th International Conference on Machine Learning, 2019

Online Learning to Rank with Features.
Proceedings of the 36th International Conference on Machine Learning, 2019

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
Proceedings of the 36th International Conference on Machine Learning, 2019

Rigorous Agent Evaluation: An Adversarial Approach to Uncover Catastrophic Failures.
Proceedings of the 7th International Conference on Learning Representations, 2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring.
Proceedings of the Conference on Learning Theory, 2019

Distribution-Dependent Analysis of Gibbs-ERM Principle.
Proceedings of the Conference on Learning Theory, 2019

Cleaning up the neighborhood: A full classification for adversarial partial monitoring.
Proceedings of the Algorithmic Learning Theory, 2019

An Exponential Efron-Stein Inequality for <i>L<sub>q</sub></i> Stable Learning Rules.
Proceedings of the Algorithmic Learning Theory, 2019

Online Algorithm for Unsupervised Sensor Selection.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Model-Free Linear Quadratic Control via Reduction to Expert Prediction.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

An Exponential Tail Bound for the Deleted Estimate.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
A Linearly Relaxed Approximate Linear Program for Markov Decision Processes.
IEEE Trans. Autom. Control., 2018

Stochastic Optimization in a Cumulative Prospect Theory Framework.
IEEE Trans. Autom. Control., 2018

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits.
CoRR, 2018

BubbleRank: Safe Online Learning to Rerank.
CoRR, 2018

Regret Bounds for Model-Free Linear Quadratic Control.
CoRR, 2018

PAC-Bayes bounds for stable algorithms with instance-dependent priors.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

TopRank: A practical algorithm for online stochastic ranking.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

An Exponential Tail Bound for Lq Stable Learning Rules. Application to k-Folds Cross-Validation.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2018

LEAPSANDBOUNDS: A Method for Approximately Optimal Algorithm Configuration.
Proceedings of the 35th International Conference on Machine Learning, 2018

Bandits with Delayed, Aggregated Anonymous Feedback.
Proceedings of the 35th International Conference on Machine Learning, 2018

Linear Stochastic Approximation: How Far Does Constant Step-Size and Iterate Averaging Go?
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities.
J. Mach. Learn. Res., 2017

Stochastic Low-Rank Bandits.
CoRR, 2017

Bandits with Delayed Anonymous Feedback.
CoRR, 2017

Linear Stochastic Approximation: Constant Step-Size and Iterate Averaging.
CoRR, 2017

Mixing time estimation in reversible Markov chains from a single sample path.
CoRR, 2017

Crowdsourcing with Sparsely Interacting Workers.
CoRR, 2017

An a Priori Exponential Tail Bound for k-Folds Cross-Validation.
CoRR, 2017

Multi-view Matrix Factorization for Linear Dynamical System Estimation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Bernoulli Rank-1 Bandits for Click Feedback.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Online Learning to Rank in Stochastic Click Models.
Proceedings of the 34th International Conference on Machine Learning, 2017

A Modular Analysis of Adaptive (Non-)Convex Optimization: Optimism, Composite Objectives, and Variational Bounds.
Proceedings of the International Conference on Algorithmic Learning Theory, 2017

Structured Best Arm Identification with Fixed Confidence.
Proceedings of the International Conference on Algorithmic Learning Theory, 2017

The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Stochastic Rank-1 Bandits.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

Unsupervised Sequential Sensor Acquisition.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Regularized Policy Iteration with Nonparametric Function Spaces.
J. Mach. Learn. Res., 2016

Multiclass Classification Calibration Functions.
CoRR, 2016

Policy Error Bounds for Model-Based Reinforcement Learning with Factored Linear Models.
CoRR, 2016

Sequential Learning without Feedback.
CoRR, 2016

Max-affine estimators for convex stochastic programming.
CoRR, 2016

Chaining Bounds for Empirical Risk Minimization.
CoRR, 2016

SDP Relaxation with Randomized Rounding for Energy Disaggregation.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Conservative Bandits.
Proceedings of the 33nd International Conference on Machine Learning, 2016

DCM Bandits: Learning to Rank with Multiple Clicks.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Shifting Regret, Mirror Descent, and Matrices.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control.
Proceedings of the 33nd International Conference on Machine Learning, 2016

(Bandit) Convex Optimization with Biased Noisy Gradient Oracles.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Cascading Bandits.
CoRR, 2015

Learning with a Strong Adversary.
CoRR, 2015

Bayesian Optimal Control of Smoothly Parameterized Systems.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

Online Learning with Gaussian Payoffs and Side Observations.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Linear Multi-Resource Allocation with Semi-Bandit Feedback.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Combinatorial Cascading Bandits.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Mixing Time Estimation in Reversible Markov Chains from a Single Sample Path.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Fast Cross-Validation for Incremental Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Cascading Bandits: Learning to Rank in the Cascade Model.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Deterministic Independent Component Analysis.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Toward Minimax Off-policy Value Estimation.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Near-optimal max-affine estimators for convex regression.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Pathological Effects of Variance on Classification-Based Policy Iteration.
Proceedings of the Learning for General Competency in Video Games, 2015

Decision-Theoretic Clustering of Strategies.
Proceedings of the Computer Poker and Imperfect Information, 2015

2014
Sequential Learning for Multi-Channel Wireless Network Monitoring With Channel Switching Costs.
IEEE Trans. Signal Process., 2014

Guest Editors' introduction.
Theor. Comput. Sci., 2014

Online Markov Decision Processes Under Bandit Feedback.
IEEE Trans. Autom. Control., 2014

Partial Monitoring - Classification, Regret Bounds, and Algorithms.
Math. Oper. Res., 2014

On Minimax Optimal Offline Policy Evaluation.
CoRR, 2014

Bayesian Optimal Control of Smoothly Parameterized Systems: The Lazy Posterior Sampling Algorithm.
CoRR, 2014

Optimal Resource Allocation with Semi-Bandit Feedback.
Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

Universal Option Models.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Generalization Bounds for Partially Linear Models.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2014

Adaptive Monte Carlo via Bandit Allocation.
Proceedings of the 31th International Conference on Machine Learning, 2014

Online Learning in Markov Decision Processes with Changing Cost Sequences.
Proceedings of the 31th International Conference on Machine Learning, 2014

On Learning the Optimal Waiting Time.
Proceedings of the Algorithmic Learning Theory - 25th International Conference, 2014

A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

Pseudo-MDPs and factored linear action models.
Proceedings of the 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2014

2013
Toward a classification of finite partial-monitoring games.
Theor. Comput. Sci., 2013

Alignment based kernel learning with a continuous set of base kernels.
Mach. Learn., 2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions
CoRR, 2013

Online Learning with Costly Features and Labels.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Characterizing the Representer Theorem.
Proceedings of the 30th International Conference on Machine Learning, 2013

Cost-sensitive Multiclass Classification Risk Bounds.
Proceedings of the 30th International Conference on Machine Learning, 2013

Online Learning under Delayed Feedback.
Proceedings of the 30th International Conference on Machine Learning, 2013

A Randomized Mirror Descent Algorithm for Large Scale Multiple Kernel Learning.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
The adversarial stochastic shortest path problem with unknown transition probabilities.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstractions
CoRR, 2012

A Randomized Strategy for Learning to Combine Many Features
CoRR, 2012

The grand challenge of computer Go: Monte Carlo tree search and extensions.
Commun. ACM, 2012

Deep Representations and Codes for Image Auto-Annotation.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Analysis of Kernel Mean Matching under Covariate Shift.
Proceedings of the 29th International Conference on Machine Learning, 2012

Statistical linear estimation with penalized estimators: an application to reinforcement learning.
Proceedings of the 29th International Conference on Machine Learning, 2012

An adaptive algorithm for finite stochastic partial monitoring.
Proceedings of the 29th International Conference on Machine Learning, 2012

Evaluation and Analysis of the Performance of the EXP3 Algorithm in Stochastic Environments.
Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Preface.
Proceedings of the Tenth European Workshop on Reinforcement Learning, 2012

Partial Monitoring with Side Information.
Proceedings of the Algorithmic Learning Theory - 23rd International Conference, 2012

Approximate Policy Iteration with Linear Action Models.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011
Model selection in reinforcement learning.
Mach. Learn., 2011

Agnostic KWIK learning and efficient approximate reinforcement learning.
Proceedings of the COLT 2011, 2011

<i>X</i>-Armed Bandits.
J. Mach. Learn. Res., 2011

Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments.
Proceedings of the COLT 2011, 2011

Regret Bounds for the Adaptive Control of Linear Quadratic Systems.
Proceedings of the COLT 2011, 2011

Non-trivial two-armed partial-monitoring games are bandits
CoRR, 2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems
CoRR, 2011

PAC-Bayesian Policy Evaluation for Reinforcement Learning.
Proceedings of the UAI 2011, 2011

Improved Algorithms for Linear Stochastic Bandits.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Sequential learning for optimal monitoring of multi-channel wireless networks.
Proceedings of the INFOCOM 2011. 30th IEEE International Conference on Computer Communications, 2011

Invited Talk: Towards Robust Reinforcement Learning Algorithms.
Proceedings of the Recent Advances in Reinforcement Learning - 9th European Workshop, 2011

Editors' Introduction.
Proceedings of the Algorithmic Learning Theory - 22nd International Conference, 2011

2010
Algorithms for Reinforcement Learning
Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers, ISBN: 978-3-031-01551-9, 2010

Active learning in heteroscedastic noise.
Theor. Comput. Sci., 2010

A Markov-Chain Monte Carlo Approach to Simultaneous Localization and Mapping.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

REGO: Rank-based Estimation of Renyi Information using Euclidean Graph Optimization.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Models of active learning in group-structured state spaces.
Inf. Comput., 2010

X-Armed Bandits
CoRR, 2010

Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Parametric Bandits: The Generalized Linear Case.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Error Propagation for Approximate Policy and Value Iteration.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Extending rapidly-exploring random trees for asymptotically optimal anytime motion planning.
Proceedings of the 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010

Model-based reinforcement learning with nearly tight exploration complexity bounds.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Toward Off-Policy Learning Control with Function Approximation.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Budgeted Distribution Learning of Belief Net Parameters.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

The Online Loop-free Stochastic Shortest-Path Problem.
Proceedings of the COLT 2010, 2010

Toward a Classification of Finite Partial-Monitoring Games.
Proceedings of the Algorithmic Learning Theory, 21st International Conference, 2010

2009
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits.
Theor. Comput. Sci., 2009

Training parsers by inverse reinforcement learning.
Mach. Learn., 2009

Learning Exercise Policies for American Options.
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

A General Projection Property for Distribution Families.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Multi-Step Dyna Planning for Policy Evaluation and Control.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Model-based and model-free reinforcement learning for visual servoing.
Proceedings of the 2009 IEEE International Conference on Robotics and Automation, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learning when to stop thinking and do something!
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learning to segment from a few well-selected training images.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Workshop summary: On-line learning with limited feedback.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

LMS-2: Towards an algorithm that is as cheap as LMS and almost as efficient as RLS.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009

Regularized Fitted Q-Iteration for planning in continuous-space Markovian decision problems.
Proceedings of the American Control Conference, 2009

2008
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.
Mach. Learn., 2008

Finite-Time Bounds for Fitted Value Iteration.
J. Mach. Learn. Res., 2008

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping.
Proceedings of the UAI 2008, 2008

Speeding Up Planning in Markov Decision Processes via Automatically Constructed Abstraction.
Proceedings of the UAI 2008, 2008

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Regularized Policy Iteration.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Online Optimization in X-Armed Bandits.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Empirical Bernstein stopping.
Proceedings of the Machine Learning, 2008

Regularized Fitted Q-Iteration: Application to Planning.
Proceedings of the Recent Advances in Reinforcement Learning, 8th European Workshop, 2008

Active Learning of Group-Structured Environments.
Proceedings of the Algorithmic Learning Theory, 19th International Conference, 2008

Active Learning in Multi-armed Bandits.
Proceedings of the Algorithmic Learning Theory, 19th International Conference, 2008

2007
Apprenticeship Learning using Inverse Reinforcement Learning and Gradient Methods.
Proceedings of the UAI 2007, 2007

Fitted Q-iteration in continuous action-space MDPs.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Continuous Time Associative Bandit Problems.
Proceedings of the IJCAI 2007, 2007

Sequence Prediction Exploiting Similary Information.
Proceedings of the IJCAI 2007, 2007

Manifold-adaptive dimension estimation.
Proceedings of the Machine Learning, 2007

Improved Rates for the Stochastic Continuum-Armed Bandit Problem.
Proceedings of the Learning Theory, 20th Annual Conference on Learning Theory, 2007

Tuning Bandit Algorithms in Stochastic Environments.
Proceedings of the Algorithmic Learning Theory, 18th International Conference, 2007

2006
Universal parameter optimisation in games based on SPSA.
Mach. Learn., 2006

Local Importance Sampling: A Novel Technique to Enhance Particle Filtering.
J. Multim., 2006

Bandit Based Monte-Carlo Planning.
Proceedings of the Machine Learning: ECML 2006, 2006

RSPSA: Enhanced Parameter Optimization in Games.
Proceedings of the Advances in Computer Games, 11th International Conference, 2006

2005
Finite time bounds for sampling based fitted value iteration.
Proceedings of the Machine Learning, 2005

X-mHMM: An Efficient Algorithm for Training Mixtures of HMMs When the Number of Mixtures Is Unknown.
Proceedings of the 5th IEEE International Conference on Data Mining (ICDM 2005), 2005

Log-optimal currency portfolios and control Lyapunov exponents.
Proceedings of the 44th IEEE IEEE Conference on Decision and Control and 8th European Control Conference Control, 2005

2004
Interpolation-based Q-learning.
Proceedings of the Machine Learning, 2004

Margin Maximizing Discriminant Analysis.
Proceedings of the Machine Learning: ECML 2004, 2004

Enhancing Particle Filters Using Local Likelihood Sampling.
Proceedings of the Computer Vision, 2004

Kernel Machine Based Feature Extraction Algorithms for Regression Problems.
Proceedings of the 16th Eureopean Conference on Artificial Intelligence, 2004

Shortest Path Discovery Problems: A Framework, Algorithms and Experimental Results.
Proceedings of the Nineteenth National Conference on Artificial Intelligence, 2004

2003
Sequential Importance Sampling for Visual Tracking Reconsidered.
Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, 2003

2002
An Asymptotic Scaling Analysis of LQ Performance for an Approximate Adaptive Control Design.
Math. Control. Signals Syst., 2002

LQ performance bounds for adaptive output feedback controllers for functionally uncertain nonlinear systems.
Autom., 2002

2001
Ockham's Razor Modeling of the Matrisome Channels of the Basal Ganglia Thalamocortical Loops.
Int. J. Neural Syst., 2001

Efficient approximate planning in continuous space Markovian Decision Problems.
AI Commun., 2001

2000
Uncertainty, performance, and model dependency in approximate adaptive nonlinear control.
IEEE Trans. Autom. Control., 2000

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms.
Mach. Learn., 2000

Modular Reinforcement Learning: A Case Study in a Robot Domain.
Acta Cybern., 2000

FlexVoice: A Parametric Approach to High-Quality Speech Synthesis.
Proceedings of the Text, Speech and Dialogue - Third International Workshop, 2000

1999
Parallel and robust skeletonization built on self-organizing elements.
Neural Networks, 1999

A Unified Analysis of Value-Function-Based Reinforcement Learning Algorithms.
Neural Comput., 1999

The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments.
Nucleic Acids Res., 1999

1998
Module-Based Reinforcement Learning: Experiments with a Real Robot.
Mach. Learn., 1998

An integrated architecture for motion-control and path-planning.
J. Field Robotics, 1998

Non-Markovian Policies in Sequential Decision Problems.
Acta Cybern., 1998

Performance-Evaluation for Automated Detection of Microcalcifications in Mammograms Using Three Different Film-Digitizers.
Proceedings of the Digital Mammography, 1998

Automated Detection and Classification of Micro-Calcifications in Mammograms Using Artifical Neural Nets.
Proceedings of the Digital Mammography, 1998

Multi-criteria Reinforcement Learning.
Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

1997
Neurocontroller using dynamic state feedback for compensatory control.
Neural Networks, 1997

The Asymptotic Convergence-Rate of Q-learning.
Proceedings of the Advances in Neural Information Processing Systems 10, 1997

Module Based Reinforcement Learning: An Application to a Real Robot.
Proceedings of the Learning Robots, 6th European Workshop, 1997

Learning and Exploitation Do Not Conflict Under Minimax Optimality.
Proceedings of the Machine Learning: ECML-97, 1997

1996
Approximate geometry representations and sensory fusion.
Neurocomputing, 1996

Self-Organizing Multi-Resolution Grid for Motion Planning and Control.
Int. J. Neural Syst., 1996

A Generalized Reinforcement-Learning Model: Convergence and Applications.
Proceedings of the Machine Learning, 1996

Inverse Dynamics Controllers for Robust Control: Consequences for Neurocontrollers.
Proceedings of the Artificial Neural Networks, 1996

1994
Topology Learning Solved by Extended Objects: A Neural Network Model.
Neural Comput., 1994

1993
Behavior of an Adaptive Self-organizing Autonomous Agent Working with Cues and Competing Concepts.
Adapt. Behav., 1993


  Loading...