Richard S. Sutton

E. James Kehoe

Neural Comput., 2008

Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping.

[BibT_eX]

[DOI]

Proceedings of the UAI 2008, 2008

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation.

[BibT_eX]

[DOI]

Csaba Szepesvári

Hamid Reza Maei

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

A computational model of hippocampal function in trace conditioning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Sample-based learning and search with permanent and transient memories.

[BibT_eX]

[DOI]

David Silver

Martin Müller

Proceedings of the Machine Learning, 2008

Agent Learning using Action-Dependent Learning Rates in Computer Role-Playing Games.

[BibT_eX]

[DOI]

Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference, 2008

2007

Incremental Natural Actor-Critic Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Reinforcement Learning of Local Shape in the Game of Go.

[BibT_eX]

[DOI]

David Silver

Martin Müller

Proceedings of the IJCAI 2007, 2007

On the role of tracking in stationary environments.

[BibT_eX]

[DOI]

Anna Koop

David Silver

Proceedings of the Machine Learning, 2007

2006

iLSTD: Eligibility Traces and Convergence Analysis.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Incremental Least-Squares Temporal Difference Learning.

[BibT_eX]

[DOI]

Alborz Geramifard

Michael H. Bowling

Proceedings of the Proceedings, 2006

2005

Reinforcement Learning for RoboCup Soccer Keepaway.

[BibT_eX]

[DOI]

Gregory Kuhlmann

Adapt. Behav., 2005

Temporal Abstraction in Temporal-difference Networks.

[BibT_eX]

[DOI]

Eddie J. Rafols

Anna Koop

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Off-policy Learning with Options and Recognizers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Temporal-Difference Networks with History.

[BibT_eX]

[DOI]

Brian Tanner

Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

Using Predictive Representations to Improve Generalization in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

TD(lambda) networks: temporal-difference networks with eligibility traces.

[BibT_eX]

[DOI]

Brian Tanner

Proceedings of the Machine Learning, 2005

2004

Temporal-Difference Networks.

[BibT_eX]

[DOI]

Brian Tanner

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

2001

Keepaway Soccer: A Machine Learning Testbed.

[BibT_eX]

[DOI]

Proceedings of the RoboCup 2001: Robot Soccer World Cup V, 2001

Predictive Representations of State.

[BibT_eX]

[DOI]

Michael L. Littman

Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Scaling Reinforcement Learning toward RoboCup Soccer.

[BibT_eX]

Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001

Off-Policy Temporal Difference Learning with Function Approximation.

[BibT_eX]

Sanjoy Dasgupta

Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), Williams College, Williamstown, MA, USA, June 28, 2001

2000

Reinforcement Learning for 3 vs. 2 Keepaway

[BibT_eX]

[DOI]

Proceedings of the RoboCup 2000: Robot Soccer World Cup IV, 2000

Eligibility Traces for Off-Policy Policy Evaluation.

[BibT_eX]

Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

1999

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.

[BibT_eX]

[DOI]

Artif. Intell., 1999

Policy Gradient Methods for Reinforcement Learning with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

Open Theoretical Questions in Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Computational Learning Theory, 4th European Conference, 1999

1998

Reinforcement Learning: An Introduction.

[BibT_eX]

[DOI]

IEEE Trans. Neural Networks, 1998

Reinforcement Learning: Past, Present and Future.

[BibT_eX]

[DOI]

Proceedings of the Simulated Evolution and Learning, 1998

Improved Switching among Temporally Abstract Actions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Learning Instance-Independent Value Functions to Enhance Local Search.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Intra-Option Learning about Temporally Abstract Actions.

[BibT_eX]

Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Theoretical Results on Reinforcement Learning with Temporally Abstract Options.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML-98, 1998

Reinforcement learning - an introduction.

[BibT_eX]

[DOI]

Adaptive computation and machine learning, MIT Press, ISBN: 978-0-262-19398-6, 1998

1997

Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces.

[BibT_eX]

[DOI]

Juan Carlos Santamaría

Ashwin Ram

Adapt. Behav., 1997

Multi-time Models for Temporally Abstract Planning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 10, 1997

Exponentiated Gradient Methods for Reinforcement Learning.

[BibT_eX]

Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), 1997

On the Significance of Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Artificial Neural Networks, 1997

1996

Reinforcement Learning with Replacing Eligibility Traces.

[BibT_eX]

[DOI]

Satinder P. Singh

Mach. Learn., 1996

1995

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 8, 1995

TD Models: Modeling the World at a Mixture of Time Scales.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 1995

1993

Online Learning with Random Representations.

[BibT_eX]

[DOI]

Steven D. Whitehead

Proceedings of the Machine Learning, 1993

1992

Adapting Bias by Gradient Descent: An Incremental Version of Delta-Bar-Delta.

[BibT_eX]

[DOI]

Proceedings of the 10th National Conference on Artificial Intelligence, 1992

1991

Dyna, an Integrated Architecture for Learning, Planning, and Reacting.

[BibT_eX]

[DOI]

SIGART Bull., 1991

Iterative Construction of Sparse Polynomial Approximations.

[BibT_eX]

[DOI]

Terence D. Sanger

Christopher J. Matheus

Proceedings of the Advances in Neural Information Processing Systems 4, 1991

Learning Polynomial Functions by Feature Construction.

[BibT_eX]

[DOI]

Christopher J. Matheus

Proceedings of the Eighth International Workshop (ML91), 1991

Planning by Incremental Dynamic Programming.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Workshop (ML91), 1991

1990

Integrated Modeling and Control Based on Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 3, 1990

Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 1990

1989

Sequential Decision Probelms and Neural Networks.

[BibT_eX]

[DOI]

Christopher J. C. H. Watkins

Proceedings of the Advances in Neural Information Processing Systems 2, 1989

1988

Learning to Predict by the Methods of Temporal Differences.

[BibT_eX]

[DOI]

Mach. Learn., 1988

1985

Training and Tracking in Robotics.

[BibT_eX]

[DOI]

Oliver G. Selfridge

Proceedings of the 9th International Joint Conference on Artificial Intelligence. Los Angeles, 1985

1983

Neuronlike adaptive elements that can solve difficult learning control problems.

[BibT_eX]

[DOI]