Satinder Singh

Orcid: 0000-0002-5169-9486

Affiliations:
  • DeepMind, London, UK
  • University of Michigan, Department of Electrical Engineering and Computer Science, Ann Arbor, MI, USA
  • Syntek Capital
  • AT&T Labs, Florham Park, NJ, USA
  • University of Colorado Boulder, Department of Computer Science, CO, USA
  • Massachusetts Institute of Technology (MIT), Brain and Cognitive Science Department, Cambridge, MA, USA


According to our database1, Satinder Singh authored at least 250 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Attention learning models using local Zernike moments-based normalized images and convolutional neural networks for skin lesion classification.
Biomed. Signal Process. Control., 2024


2023
Risk-aware analysis for interpretations of probabilistic achievement and maintenance commitments.
Artif. Intell., April, 2023

POMRL: No-Regret Learning-to-Plan with Increasing Horizons.
Trans. Mach. Learn. Res., 2023

Diversifying AI: Towards Creative Chess with AlphaZero.
CoRR, 2023

On the Convergence of Bounded Agents.
CoRR, 2023

Hierarchical Reinforcement Learning in Complex 3D Environments.
CoRR, 2023

Optimistic Meta-Gradients.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Combining Behaviors with the Successor Features Keyboard.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models can Implement Policy Iteration.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Definition of Continual Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Structured State Space Models for In-Context Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs.
Proceedings of the International Conference on Machine Learning, 2023


Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

In-context Reinforcement Learning with Algorithm Distillation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Composing Task Knowledge With Modular Successor Feature Approximators.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Discovering Evolution Strategies via Meta-Black-Box Optimization.
Proceedings of the Companion Proceedings of the Conference on Genetic and Evolutionary Computation, 2023

2022

In-Context Policy Iteration.
CoRR, 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.
CoRR, 2022

GrASP: Gradient-Based Affordance Selection for Planning.
CoRR, 2022

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Approximate Value Equivalence.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Expressivity of Markov Reward (Extended Abstract).
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Bootstrapped Meta-Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Meta-Gradients in Non-Stationary Environments.
Proceedings of the Conference on Lifelong Learning Agents, 2022

Adaptive Pairwise Weights for Temporal Credit Assignment.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks.
CoRR, 2021

Discovering Diverse Nearly Optimal Policies withSuccessor Features.
CoRR, 2021

Pairwise Weights for Temporal Credit Assignment.
CoRR, 2021

Reward is enough.
Artif. Intell., 2021

Learning State Representations from Random Deep Action-conditional Predictions.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reward is enough for convex MDPs.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Discovery of Options via Meta-Learned Subgoals.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Proper Value Equivalence.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Expressivity of Markov Reward.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Reinforcement Learning of Implicit and Explicit Control Flow Instructions.
Proceedings of the 38th International Conference on Machine Learning, 2021

Discovering a set of policies for the worst case reward.
Proceedings of the 9th International Conference on Learning Representations, 2021

Efficient Querying for Cooperative Probabilistic Commitments.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments.
CoRR, 2020

Self-Tuning Deep Reinforcement Learning.
CoRR, 2020

Semantics and algorithms for trustworthy commitment achievement under model uncertainty.
Auton. Agents Multi Agent Syst., 2020

A Self-Tuning Actor-Critic Algorithm.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Meta-Gradient Reinforcement Learning with an Objective Discovered Online.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

On Efficiency in Hierarchical Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Discovering Reinforcement Learning Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

What Can Learned Intrinsic Rewards Capture?
Proceedings of the 37th International Conference on Machine Learning, 2020

Behaviour Suite for Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

How Should an Agent Practice?
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Modeling Probabilistic Commitments for Maintenance Is Inherently Harder than for Achievement.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Online and Scalable Adaptive Cyber Defense.
Proceedings of the Adversarial and Uncertain Reasoning for Adaptive Cyber Defense, 2019

Disentangled Cumulants Help Successor Representations Transfer to New Tasks.
CoRR, 2019

Object-oriented state editing for HRL.
CoRR, 2019

Learning Independently-Obtainable Reward Functions.
CoRR, 2019

NE-Table: A Neural key-value table for Named Entities.
Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2019

Discovery of Useful Questions as Auxiliary Tasks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

No-Press Diplomacy: Modeling Multi-Agent Gameplay.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Hindsight Credit Assignment.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments.
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019

Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Learning to Communicate and Solve Visual Blocks-World Tasks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis.
Secur. Commun. Networks, 2018

Generative Adversarial Self-Imitation Learning.
CoRR, 2018

Many-Goals Reinforcement Learning.
CoRR, 2018

Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities.
CoRR, 2018

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA.
CoRR, 2018

On Learning Intrinsic Rewards for Policy Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Completing State Representations using Spectral Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Self-Imitation Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Learning End-to-End Goal-Oriented Dialog with Multiple Answers.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Challenges in the Trustworthy Pursuit of Maintenance Commitments Under Uncertainty.
Proceedings of the 20th International Trust Workshop co-located with AAMAS/IJCAI/ECAI/ICML 2018, 2018

On Querying for Safe Optimality in Factored Markov Decision Processes.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

Markov Decision Processes with Continuous Side Information.
Proceedings of the Algorithmic Learning Theory, 2018

2017
Value Prediction Network.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Repeated Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Learning to Query, Reason, and Answer Questions On Ambiguous Texts.
Proceedings of the 5th International Conference on Learning Representations, 2017

A Stackelberg Game Model for Botnet Data Exfiltration.
Proceedings of the Decision and Game Theory for Security - 8th International Conference, 2017

Predicting Counselor Behaviors in Motivational Interviewing Encounters.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis.
Proceedings of the 2017 Workshop on Moving Target Defense, 2017

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making.
Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes.
Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017

Understanding and Predicting Empathic Behavior in Counseling Therapy.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

A Stackelberg Game Model for Botnet Traffic Exfiltration.
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Multi-task seizure detection: addressing intra-patient variation in seizure morphologies.
Mach. Learn., 2016

Towards Resolving Unidentifiability in Inverse Reinforcement Learning.
CoRR, 2016

Gradient Methods for Stackelberg Games.
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Building a Motivational Interviewing Dataset.
Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2016

Commitment Semantics for Sequential Decision Making under Reward Uncertainty.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

The Dependence of Effective Planning Horizon on Model Accuracy.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Control of Memory, Active Perception, and Action in Minecraft.
Proceedings of the 33nd International Conference on Machine Learning, 2016

On the Trustworthy Fulfillment of Commitments.
Proceedings of the 18th International Workshop on Trust in Agent Societies co-located with the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), 2016

Improving Predictive State Representations via Gradient Descent.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Action-Conditional Video Prediction using Deep Networks in Atari Games.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Abstraction Selection in Model-based Reinforcement Learning.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Low-Rank Spectral Learning with Weighted Loss Functions.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Commitment Semantics for Sequential Decision Making Under Reward Uncertainty.
Proceedings of the 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, 2015

Spectral Learning of Predictive State Representations with Insufficient Statistics.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization.
Top. Cogn. Sci., 2014

Utility Maximization and Bounds on Human Information Processing.
Top. Cogn. Sci., 2014

Optimal Rewards for Cooperative Agents.
IEEE Trans. Auton. Ment. Dev., 2014

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech.
Proceedings of the IEEE International Conference on Acoustics, 2014

Improving UCT planning via approximate homomorphisms.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2014

Low-Rank Spectral Learning.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

Characterizing EVOI-Sufficient k-Response Query Sets in Decision Problems.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Games.
Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, 2014

Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory.
Proceedings of the Fifth Workshop on Cognitive Modeling and Computational Linguistics, 2014

Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

Predicting Postoperative Atrial Fibrillation from Independent ECG Components.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013
The Adaptive Nature of Eye Movements in Linguistic Tasks: How Payoff and Architecture Shape Speed-Accuracy Trade-Offs.
Top. Cogn. Sci., 2013

Nash Convergence of Gradient Dynamics in Iterated General-Sum Games
CoRR, 2013

Reward Mapping for Transfer in Long-Lived Agents.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Linking Context to Evaluation in the Design of Safety Critical Interfaces.
Proceedings of the Human-Computer Interaction. Human-Centred Design Approaches, Methods, Tools, and Environments, 2013

2012
Knowledge Combination in Graphical Multiagent Model
CoRR, 2012

Reports of the AAAI 2011 Conference Workshops.
AI Mag., 2012

Lossy stochastic game abstraction with bounds.
Proceedings of the 13th ACM Conference on Electronic Commerce, 2012

Optimal rewards in multiagent teams.
Proceedings of the 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, 2012

Planning and evaluating multiagent influences under reward uncertainty.
Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Learning and predicting dynamic networked behavior with graphical multiagent models.
Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Strong mitigation: nesting search for good policies within search for good reward.
Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Security Games with Limited Surveillance: An Initial Report.
Proceedings of the Game Theory for Security, 2012

Computing Stackelberg Equilibria in Discounted Stochastic Games.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

Security Games with Limited Surveillance.
Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011
IP Geolocation in Metropolitan Areas.
PhD thesis, 2011

Learning to Make Predictions In Partially Observable Environments Without a Generative Model.
J. Artif. Intell. Res., 2011

Modeling Information Diffusion in Networks with Unobserved Links.
Proceedings of the PASSAT/SocialCom 2011, Privacy, 2011

IP geolocation in metropolitan areas.
Proceedings of the SIGMETRICS 2011, 2011

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

Comparing Action-Query Strategies in Semi-Autonomous Agents.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010
Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective.
IEEE Trans. Auton. Ment. Dev., 2010

Dynamic Incentive Mechanisms.
AI Mag., 2010

Variance-Based Rewards for Approximate Bayesian Reinforcement Learning.
Proceedings of the UAI 2010, 2010

Reward Design via Online Gradient Ascent.
Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Internal Rewards Mitigate Agent Boundedness.
Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Selecting Operator Queries Using Expected Myopic Gain.
Proceedings of the 2010 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2010

Linear options.
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), 2010

History-dependent graphical multiagent models.
Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), 2010

2009
Maintaining Predictions over Time without a Model.
Proceedings of the IJCAI 2009, 2009

Learning Graphical Game Models.
Proceedings of the IJCAI 2009, 2009

Transfer via soft homomorphisms.
Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

SarsaLandmark: an algorithm for learning in POMDPs with landmarks.
Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

2008
Knowledge Combination in Graphical Multiagent Models.
Proceedings of the UAI 2008, 2008

Simple Local Models for Complex Dynamical Systems.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Building Incomplete but Accurate Models.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Predictive Linear-Gaussian Models of Dynamical Systems with Vector-Valued Actions and Observations.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Efficiently learning linear-linear exponential family predictive representations of state.
Proceedings of the Machine Learning, 2008

Approximate predictive state representations.
Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), 2008

2007
Learning payoff functions in infinite games.
Mach. Learn., 2007

DaNaLIX: a domain-adaptive natural language interface for querying XML.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007

Exponential Family Predictive Representations of State.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Relational Knowledge with Predictive State Representations.
Proceedings of the IJCAI 2007, 2007

An Experts Algorithm for Transfer Learning.
Proceedings of the IJCAI 2007, 2007

On discovery and learning of models with predictive representations of state for agents with continuous actions and observations.
Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), 2007

Constraint satisfaction algorithms for graphical games.
Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), 2007

Abstraction in Predictive State Representations.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

Enabling Domain-Awareness for a Generic Natural Language Interface.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

2006
Cobot in LambdaMOO: An Adaptive Social Statistics Agent.
Auton. Agents Multi Agent Syst., 2006

Optimal Coordinated Planning Amongst Self-Interested Agents with Private State.
Proceedings of the UAI '06, 2006

Predictive state representations with options.
Proceedings of the Machine Learning, 2006

Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems.
Proceedings of the Machine Learning, 2006

Predictive linear-Gaussian models of controlled stochastic dynamical systems.
Proceedings of the Machine Learning, 2006

Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems.
Proceedings of the Proceedings, 2006

Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains.
Proceedings of the Proceedings, 2006

2005
Strategic Interactions in a Supply Chain Game.
Comput. Intell., 2005

Reports on the 2004 AAAI Fall Symposia.
AI Mag., 2005

Predictive Linear-Gaussian Models of Stochastic Dynamical Systems.
Proceedings of the UAI '05, 2005

Off-policy Learning with Options and Recognizers.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Combining Memory and Landmarks with Predictive State Representations.
Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

Learning predictive state representations in dynamical systems without reset.
Proceedings of the Machine Learning, 2005

Planning in Models that Combine Memory with Predictive Representations of State.
Proceedings of the Proceedings, 2005

2004
Value-driven procurement in the TAC supply chain game.
SIGecom Exch., 2004

Predictive State Representations: A New Theory for Modeling Dynamical Systems.
Proceedings of the UAI '04, 2004

Computing approximate bayes-nash equilibria in tree-games of incomplete information.
Proceedings of the Proceedings 5th ACM Conference on Electronic Commerce (EC-2004), 2004

Intrinsically Motivated Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Approximately Efficient Online Mechanism Design.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Planning with predictive state representations.
Proceedings of the 2004 International Conference on Machine Learning and Applications, 2004

Adaptive cognitive orthotics: combining reinforcement learning and constraint-based temporal reasoning.
Proceedings of the Machine Learning, 2004

Learning and discovery of predictive state representations in dynamical systems with reset.
Proceedings of the Machine Learning, 2004

Strategic Interactions in the TAC 2003 Supply Chain Tournament.
Proceedings of the Computers and Games, 4th International Conference, 2004

Distributed Feedback Control for Decision Making on Supply Chains.
Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), 2004

2003
A Nonlinear Predictive State Representation.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

An MDP-Based Approach to Online Mechanism Design.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Learning Predictive State Representations.
Proceedings of the Machine Learning, 2003

2002
Introduction.
Mach. Learn., 2002

Near-Optimal Reinforcement Learning in Polynomial Time.
Mach. Learn., 2002

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System.
J. Artif. Intell. Res., 2002

CobotDS: A Spoken Dialogue System for Chat.
Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28, 2002

2001
ATTac-2000: An Adaptive Autonomous Bidding Agent.
J. Artif. Intell. Res., 2001

FAucS : An FCC Spectrum Auction Simulator for Autonomous Bidding Agents.
Proceedings of the Electronic Commerce, Second International Workshop, 2001

Graphical Models for Game Theory.
Proceedings of the UAI '01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 2001

Predictive Representations of State.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Cobot: A Social Reinforcement Learning Agent.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

A social reinforcement learning agent.
Proceedings of the Fifth International Conference on Autonomous Agents, 2001

2000
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms.
Mach. Learn., 2000

Nash Convergence of Gradient Dynamics in General-Sum Games.
Proceedings of the UAI '00: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, California, USA, June 30, 2000

Fast Planning in Stochastic Games.
Proceedings of the UAI '00: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, California, USA, June 30, 2000

Reinforcement Learning for 3 vs. 2 Keepaway
Proceedings of the RoboCup 2000: Robot Soccer World Cup IV, 2000

Eligibility Traces for Off-Policy Policy Evaluation.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

A Boosting Approach to Topic Spotting on Subdialogues.
Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

Bias-Variance Error Bounds for Temporal Difference Updates.
Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT 2000), June 28, 2000

Automatic Optimization of Dialogue Management.
Proceedings of the COLING 2000, 18th International Conference on Computational Linguistics, Proceedings of the Conference, 2 Volumes, July 31, 2000

Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System.
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, July 30, 2000

Cobot in LambdaMOO: A Social Statistics Agent.
Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, July 30, 2000

1999
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.
Artif. Intell., 1999

Approximate Planning for Factored POMDPs using Belief State Simplification.
Proceedings of the UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30, 1999

On the Complexity of Policy Iteration.
Proceedings of the UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30, 1999

Policy Gradient Methods for Reinforcement Learning with Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

Reinforcement Learning for Spoken Dialogue Systems.
Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

1998
Analytical Mean Squared Error Curves for Temporal Difference Learning.
Mach. Learn., 1998

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Improved Switching among Temporally Abstract Actions.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Intra-Option Learning about Temporally Abstract Actions.
Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes.
Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Near-Optimal Reinforcement Learning in Polynominal Time.
Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Theoretical Results on Reinforcement Learning with Temporally Abstract Options.
Proceedings of the Machine Learning: ECML-98, 1998

1997
How to Dynamically Merge Markov Decision Processes.
Proceedings of the Advances in Neural Information Processing Systems 10, 1997

1996
Reinforcement Learning with Replacing Eligibility Traces.
Mach. Learn., 1996

Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems.
Proceedings of the Advances in Neural Information Processing Systems 9, 1996

Predicting Lifetimes in Dynamically Allocated Memory.
Proceedings of the Advances in Neural Information Processing Systems 9, 1996

Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards.
Proceedings of the Ninth Annual Conference on Computational Learning Theory, 1996

1995
Learning to Act Using Real-Time Dynamic Programming.
Artif. Intell., 1995

Improving Policies without Measuring Merits.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995

Markov Decision Processes in Large State Spaces.
Proceedings of the Eigth Annual Conference on Computational Learning Theory, 1995

1994
On the Convergence of Stochastic Iterative Dynamic Programming Algorithms.
Neural Comput., 1994

An Upper Bound on the Loss from Approximate Optimal-Value Functions.
Mach. Learn., 1994

Reinforcement Learning with Soft State Aggregation.
Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.
Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Learning Without State-Estimation in Partially Observable Markovian Decision Processes.
Proceedings of the Machine Learning, 1994

Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes.
Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, July 31, 1994

1993
Robust Reinforcement Learning in Motion Planning.
Proceedings of the Advances in Neural Information Processing Systems 6, 1993

1992
Transfer of Learning by Composing Solutions of Elemental Sequential Tasks.
Mach. Learn., 1992

Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models.
Proceedings of the Ninth International Workshop on Machine Learning (ML 1992), 1992

Reinforcement Learning with a Hierarchy of Abstract Models.
Proceedings of the 10th National Conference on Artificial Intelligence, 1992

1991
The Efficient Learning of Multiple Task Sequences.
Proceedings of the Advances in Neural Information Processing Systems 4, 1991

A Cortico-Cerebellar Model that Learns to Generate Distributed Motor Commands to Control a Kinematic Arm.
Proceedings of the Advances in Neural Information Processing Systems 4, 1991

Transfer of Learning Across Compositions of Sequentail Tasks.
Proceedings of the Eighth International Workshop (ML91), 1991


  Loading...