A. Rupam Mahmood

Gautham Vasan

CoRR, 2024

Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning.

[BibT_eX]

RLJ, 2024

Learning to Optimize for Reinforcement Learning.

[BibT_eX]

RLJ, 2024

More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling.

[BibT_eX]

RLJ, 2024

Weight Clipping for Deep Continual and Reinforcement Learning.

[BibT_eX]

RLJ, 2024

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation.

[BibT_eX]

[DOI]

Christopher K. Harris

Dale Schuurmans

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Revisiting Scalable Hessian Diagonal Approximations for Applications in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo.

[BibT_eX]

[DOI]

Kamyar Azizzadenesheli

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning.

[BibT_eX]

[DOI]

Decebal Constantin Mocanu

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

2023

Memory-efficient Reinforcement Learning with Value-based Knowledge Consolidation.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Elephant Neural Networks: Born to Be a Continual Learner.

[BibT_eX]

[DOI]

Qingfeng Lan

J. Fernando Hernandez-Garcia

CoRR, 2023

Maintaining Plasticity in Deep Continual Learning.

[BibT_eX]

[DOI]

Shibhansh Dohare

Parash Rahman

CoRR, 2023

Utility-based Perturbed Gradient Descent: An Optimizer for Continual Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Loosely consistent emphatic temporal-difference learning.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Dynamic Decision Frequency with Continuous Options.

[BibT_eX]

[DOI]

IROS, 2023

Reducing the Cost of Cycle-Time Tuning for Real-World Policy Optimization.

[BibT_eX]

[DOI]

Homayoon Farrahi

Proceedings of the International Joint Conference on Neural Networks, 2023

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers.

[BibT_eX]

[DOI]

Yan Wang

Gautham Vasan

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Correcting discount-factor mismatch in on-policy policy gradient methods.

[BibT_eX]

[DOI]

Fengdi Che

Gautham Vasan

Proceedings of the International Conference on Machine Learning, 2023

2022

Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2022

Variable-Decision Frequency Option Critic.

[BibT_eX]

[DOI]

CoRR, 2022

HesScale: Scalable Computation of Hessian Diagonals.

[BibT_eX]

[DOI]

CoRR, 2022

Memory-efficient Reinforcement Learning with Knowledge Consolidation.

[BibT_eX]

[DOI]

CoRR, 2022

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots.

[BibT_eX]

[DOI]

Yufeng Yuan

Proceedings of the 2022 International Conference on Robotics and Automation, 2022

A Temporal-Difference Approach to Policy Gradient Estimation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Model-free Policy Learning with Reward Gradients.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

An Alternate Policy Gradient Estimator for Softmax Policies.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness.

[BibT_eX]

[DOI]

Shibhansh Dohare

CoRR, 2021

Model-free Policy Learning with Reward Gradients.

[BibT_eX]

[DOI]

Qingfeng Lan

CoRR, 2021

Analyzing Neural Jacobian Methods in Applications of Visual Servoing and Kinematic Control.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2021

2020

Heteroscedastic Uncertainty for Robust Generative Latent Dynamics.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2020

2019

Autoregressive Policies for Continuous Control Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018

On Generalized Bellman Equations and Temporal-Difference Learning.

[BibT_eX]

[DOI]

Huizhen Yu

J. Mach. Learn. Res., 2018

Setting up a Reinforcement Learning Task with a Real-World Robot.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Benchmarking Reinforcement Learning Algorithms on Real-World Robots.

[BibT_eX]

[DOI]

Proceedings of the 2nd Annual Conference on Robot Learning, 2018

2017

Multi-step Off-policy Learning Without Importance Sampling Ratios.

[BibT_eX]

[DOI]

Huizhen Yu

CoRR, 2017

2016

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning.

[BibT_eX]

[DOI]

Martha White

J. Mach. Learn. Res., 2016

True Online Temporal-Difference Learning.

[BibT_eX]

[DOI]

Harm van Seijen

Patrick M. Pilarski

Marlos C. Machado

J. Mach. Learn. Res., 2016

2015

An Empirical Evaluation of True Online TD(λ).

[BibT_eX]

[DOI]

Harm van Seijen

Patrick M. Pilarski

CoRR, 2015

Emphatic Temporal-Difference Learning.

[BibT_eX]

[DOI]

Huizhen Yu

Martha White

CoRR, 2015

Off-policy learning based on weighted importance sampling with linear computational complexity.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

2014

Off-policy TD( l) with a true online equivalence.

[BibT_eX]

[DOI]

Hado van Hasselt

Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

Weighted importance sampling for off-policy learning with linear function approximation.

[BibT_eX]

[DOI]

Hado van Hasselt

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

A new Q(lambda) with interim forward view and Monte Carlo equivalence.

[BibT_eX]

[DOI]

Doina Precup

Hado van Hasselt

Proceedings of the 31th International Conference on Machine Learning, 2014

2013

Position Paper: Representation Search through Generate and Test.

[BibT_eX]

[DOI]

Proceedings of the Tenth Symposium on Abstraction, Reformulation, and Approximation, 2013

Representation Search through Generate and Test.

[BibT_eX]

[DOI]

Proceedings of the Learning Rich Representations from Low-Level Sensors, 2013

2012

Tuning-free step-size adaptation.

[BibT_eX]

[DOI]