Richard S. Sutton

Orcid: 0000-0002-3679-3415

Affiliations:
  • DeepMind Alberta, Edmonton, AB, Canada
  • University of Alberta, Department of Computing Science, Edmonton, AB, Canada
  • University of Massachusetts Amherst, MA, USA (PhD 1984)


According to our database1, Richard S. Sutton authored at least 163 papers between 1983 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Loss of plasticity in deep continual learning.
Nat., August, 2024

Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning.
CoRR, 2024

On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes.
CoRR, 2024

MetaOptimize: A Framework for Optimizing Step Sizes and Other Meta-parameters.
CoRR, 2024

Step-size Optimization for Continual Learning.
CoRR, 2024

Reward Centering.
RLJ, 2024

SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning.
RLJ, 2024

An Idiosyncrasy of Time-discretization in Reinforcement Learning.
RLJ, 2024

Reward-Respecting Subtasks for Model-Based Reinforcement Learning (Abstract Reprint).
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Reward-respecting subtasks for model-based reinforcement learning.
Artif. Intell., November, 2023

Communicative capital: a key resource for human-machine shared agency and collaborative capacity.
Neural Comput. Appl., August, 2023

From eye-blinks to state construction: Diagnostic benchmarks for online representation learning.
Adapt. Behav., February, 2023

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks.
J. Mach. Learn. Res., 2023

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays.
CoRR, 2023

Iterative Option Discovery for Planning, by Planning.
CoRR, 2023

Maintaining Plasticity in Deep Continual Learning.
CoRR, 2023

Online Real-Time Recurrent Learning Using Sparse Connections and Selective Learning.
CoRR, 2023

Toward Efficient Gradient-Based Value Estimation.
Proceedings of the International Conference on Machine Learning, 2023

Auxiliary task discovery through generate-and-test.
Proceedings of the Conference on Lifelong Learning Agents, 2023

Value-aware Importance Weighting for Off-policy Reinforcement Learning.
Proceedings of the Conference on Lifelong Learning Agents, 2023

2022
On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs.
CoRR, 2022

The Alberta Plan for AI Research.
CoRR, 2022

Toward Discovering Options that Achieve Faster Planning.
CoRR, 2022

The Quest for a Common Model of the Intelligent Decision Maker.
CoRR, 2022

A History of Meta-gradient: Gradient Methods for Meta-learning.
CoRR, 2022

Doubly-Asynchronous Value Iteration: Making Value Iteration Asynchronous in Actions.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
Looking Back on the Actor-Critic Architecture.
IEEE Trans. Syst. Man Cybern. Syst., 2021

Learning Agent State Online with Recurrent Generate-and-Test.
CoRR, 2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms in the Four Rooms Environment.
CoRR, 2021

Continual Backprop: Stochastic Gradient Descent with Persistent Randomness.
CoRR, 2021

An Empirical Comparison of Off-policy Prediction Learning Algorithms on the Collision Task.
CoRR, 2021

Planning with Expectation Models for Control.
CoRR, 2021

Scalable Online Recurrent Learning Using Columnar Neural Networks.
CoRR, 2021

Does Standard Backpropagation Forget Less Catastrophically Than Adam?
CoRR, 2021

Policy iterations for reinforcement learning problems in continuous time and space - Fundamental theory and methods.
Autom., 2021

Reward is enough.
Artif. Intell., 2021

Average-Reward Learning and Planning with Options.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Average-Reward Off-Policy Policy Evaluation with Function Approximation.
Proceedings of the 38th International Conference on Machine Learning, 2021

Learning and Planning in Average-Reward Markov Decision Processes.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Special Issue "On Defining Artificial Intelligence" - Commentaries and Author's Response.
J. Artif. Gen. Intell., 2020

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning.
CoRR, 2020

Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI.
CoRR, 2020

Inverse Policy Evaluation for Value-based Sequential Decision-making.
CoRR, 2020

Behaviour Suite for Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Fixed-Horizon Temporal Difference Methods for Stable Reinforcement Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Learning Sparse Representations Incrementally in Deep Reinforcement Learning.
CoRR, 2019

Discounted Reinforcement Learning is Not an Optimization Problem.
CoRR, 2019

Learning Feature Relevance Through Step Size Adaptation in Temporal-Difference Learning.
CoRR, 2019

Should All Temporal Difference Learning Use Emphasis?
CoRR, 2019

Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target.
CoRR, 2019

Planning with Expectation Models.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Extending Sliding-Step Importance Weighting from Supervised Learning to Reinforcement Learning.
Proceedings of the Artificial Intelligence. IJCAI 2019 International Workshops, 2019

Prediction in Intelligence: An Empirical Comparison of Off-policy Algorithms on Robots.
Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019

2018
On Generalized Bellman Equations and Temporal-Difference Learning.
J. Mach. Learn. Res., 2018

Reactive Reinforcement Learning in Asynchronous Environments.
Frontiers Robotics AI, 2018

Online Off-policy Prediction.
CoRR, 2018

Predicting Periodicity with Temporal Difference Learning.
CoRR, 2018

Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling.
CoRR, 2018

Two geometric input transformation methods for fast online reinforcement learning with neural nets.
CoRR, 2018

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent.
CoRR, 2018

Directly Estimating the Variance of the λ-Return Using Temporal-Difference Methods.
CoRR, 2018

Comparing Direct and Indirect Temporal-Difference Methods for Estimating the Variance of the Return.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Per-decision Multi-step Temporal Difference Learning with Control Variates.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Multi-Step Reinforcement Learning: A Unifying Algorithm.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
A Deeper Look at Experience Replay.
CoRR, 2017

Communicative Capital for Prosthetic Agents.
CoRR, 2017

GQ($λ$) Quick Reference and Implementation Guide.
CoRR, 2017

Multi-step Off-policy Learning Without Importance Sampling Ratios.
CoRR, 2017

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space.
CoRR, 2017

A First Empirical Study of Emphatic Temporal Difference Learning.
CoRR, 2017

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2017

Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning.
Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017

2016
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning.
J. Mach. Learn. Res., 2016

True Online Temporal-Difference Learning.
J. Mach. Learn. Res., 2016

Face valuing: Training user interfaces with facial expressions and reinforcement learning.
CoRR, 2016

Learning representations through stochastic gradient descent in cross-validation error.
CoRR, 2016

A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward.
CoRR, 2016

2015
True Online Emphatic TD(λ): Quick Reference and Implementation Guide.
CoRR, 2015

An Empirical Evaluation of True Online TD(λ).
CoRR, 2015

Emphatic Temporal-Difference Learning.
CoRR, 2015

Learning to Predict Independent of Span.
CoRR, 2015

Off-policy learning based on weighted importance sampling with linear computational complexity.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015

A Deeper Look at Planning as Learning from Replay.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Multi-timescale nexting in a reinforcement learning robot.
Adapt. Behav., 2014

Off-policy TD( l) with a true online equivalence.
Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 2014

Universal Option Models.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Weighted importance sampling for off-policy learning with linear function approximation.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

A new Q(lambda) with interim forward view and Monte Carlo equivalence.
Proceedings of the 31th International Conference on Machine Learning, 2014

True Online TD(lambda).
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
Adaptive Artificial Limbs: A Real-Time Approach to Prediction and Anticipation.
IEEE Robotics Autom. Mag., 2013

Temporal-Difference Learning to Assist Human Decision Making during the Control of an Artificial Limb.
CoRR, 2013

Position Paper: Representation Search through Generate and Test.
Proceedings of the Tenth Symposium on Abstraction, Reformulation, and Approximation, 2013

Real-time prediction learning for the simultaneous actuation of multiple prosthetic joints.
Proceedings of the IEEE 13th International Conference on Rehabilitation Robotics, 2013

Planning by Prioritized Sweeping with Small Backups.
Proceedings of the 30th International Conference on Machine Learning, 2013

Representation Search through Generate and Test.
Proceedings of the Learning Rich Representations from Low-Level Sensors, 2013

2012
Temporal-difference search in computer Go.
Mach. Learn., 2012

Off-Policy Actor-Critic
CoRR, 2012

Acquiring a broad range of empirical knowledge in real time by temporal-difference learning.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2012

Linear Off-Policy Actor-Critic.
Proceedings of the 29th International Conference on Machine Learning, 2012

Scaling life-long off-policy learning.
Proceedings of the 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, 2012

Tuning-free step-size adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Model-Free reinforcement learning with continuous action in practice.
Proceedings of the American Control Conference, 2012

Between Instruction and Reward: Human-Prompted Switching.
Proceedings of the Robots Learning Interactively from Human Teachers, 2012

2011
Beyond Reward: The Problem of Knowledge and Data.
Proceedings of the Inductive Logic Programming - 21st International Conference, 2011

Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction.
Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), 2011

2010
Toward Off-Policy Learning Control with Function Approximation.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

2009
Natural actor-critic algorithms.
Autom., 2009

Multi-Step Dyna Planning for Policy Evaluation and Control.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

Fast gradient-descent methods for temporal-difference learning with linear function approximation.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

2008
Stimulus Representation and the Timing of Reward-Prediction Errors in Models of the Dopamine System.
Neural Comput., 2008

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping.
Proceedings of the UAI 2008, 2008

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

A computational model of hippocampal function in trace conditioning.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Sample-based learning and search with permanent and transient memories.
Proceedings of the Machine Learning, 2008

Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games.
Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference, 2008

2007
Incremental Natural Actor-Critic Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Reinforcement Learning of Local Shape in the Game of Go.
Proceedings of the IJCAI 2007, 2007

On the role of tracking in stationary environments.
Proceedings of the Machine Learning, 2007

2006
iLSTD: Eligibility Traces and Convergence Analysis.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Incremental Least-Squares Temporal Difference Learning.
Proceedings of the Proceedings, 2006

2005
Reinforcement Learning for RoboCup Soccer Keepaway.
Adapt. Behav., 2005

Temporal Abstraction in Temporal-difference Networks.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Off-policy Learning with Options and Recognizers.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Temporal-Difference Networks with History.
Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

Using Predictive Representations to Improve Generalization in Reinforcement Learning.
Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

TD(lambda) networks: temporal-difference networks with eligibility traces.
Proceedings of the Machine Learning, 2005

2004
Temporal-Difference Networks.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

2001
Keepaway Soccer: A Machine Learning Testbed.
Proceedings of the RoboCup 2001: Robot Soccer World Cup V, 2001

Predictive Representations of State.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Scaling Reinforcement Learning toward RoboCup Soccer.
Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001

Off-Policy Temporal Difference Learning with Function Approximation.
Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001

2000
Reinforcement Learning for 3 vs. 2 Keepaway
Proceedings of the RoboCup 2000: Robot Soccer World Cup IV, 2000

Eligibility Traces for Off-Policy Policy Evaluation.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

1999
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.
Artif. Intell., 1999

Policy Gradient Methods for Reinforcement Learning with Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

Open Theoretical Questions in Reinforcement Learning.
Proceedings of the Computational Learning Theory, 4th European Conference, 1999

1998
Reinforcement Learning: An Introduction.
IEEE Trans. Neural Networks, 1998

Reinforcement Learning: Past, Present and Future.
Proceedings of the Simulated Evolution and Learning, 1998

Improved Switching among Temporally Abstract Actions.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Learning Instance-Independent Value Functions to Enhance Local Search.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Intra-Option Learning about Temporally Abstract Actions.
Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Theoretical Results on Reinforcement Learning with Temporally Abstract Options.
Proceedings of the Machine Learning: ECML-98, 1998

Reinforcement learning - an introduction.
Adaptive computation and machine learning, MIT Press, ISBN: 978-0-262-19398-6, 1998

1997
Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces.
Adapt. Behav., 1997

Multi-time Models for Temporally Abstract Planning.
Proceedings of the Advances in Neural Information Processing Systems 10, 1997

Exponentiated Gradient Methods for Reinforcement Learning.
Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), 1997

On the Significance of Markov Decision Processes.
Proceedings of the Artificial Neural Networks, 1997

1996
Reinforcement Learning with Replacing Eligibility Traces.
Mach. Learn., 1996

1995
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995

TD Models: Modeling the World at a Mixture of Time Scales.
Proceedings of the Machine Learning, 1995

1993
Online Learning with Random Representations.
Proceedings of the Machine Learning, 1993

1992
Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta.
Proceedings of the 10th National Conference on Artificial Intelligence, 1992

1991
Dyna, an Integrated Architecture for Learning, Planning, and Reacting.
SIGART Bull., 1991

Iterative Construction of Sparse Polynomial Approximations.
Proceedings of the Advances in Neural Information Processing Systems 4, 1991

Learning Polynomial Functions by Feature Construction.
Proceedings of the Eighth International Workshop (ML91), 1991

Planning by Incremental Dynamic Programming.
Proceedings of the Eighth International Workshop (ML91), 1991

1990
Integrated Modeling and Control Based on Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 3, 1990

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming.
Proceedings of the Machine Learning, 1990

1989
Sequential Decision Probelms and Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 2, 1989

1988
Learning to Predict by the Methods of Temporal Differences.
Mach. Learn., 1988

1985
Training and Tracking in Robotics.
Proceedings of the 9th International Joint Conference on Artificial Intelligence. Los Angeles, 1985

1983
Neuronlike adaptive elements that can solve difficult learning control problems.
IEEE Trans. Syst. Man Cybern., 1983


  Loading...