Satinder Singh

Artif. Intell., April, 2023

POMRL: No-Regret Learning-to-Plan with Increasing Horizons.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Diversifying AI: Towards Creative Chess with AlphaZero.

[BibT_eX]

[DOI]

CoRR, 2023

On the Convergence of Bounded Agents.

[BibT_eX]

[DOI]

CoRR, 2023

Hierarchical Reinforcement Learning in Complex 3D Environments.

[BibT_eX]

[DOI]

Bernardo Ávila Pires

Feryal M. P. Behbahani

CoRR, 2023

Optimistic Meta-Gradients.

[BibT_eX]

[DOI]

Sebastian Flennerhag

Tom Zahavy

Brendan O'Donoghue

Hado Philip van Hasselt

András György

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Combining Behaviors with the Successor Features Keyboard.

[BibT_eX]

[DOI]

Danilo Jimenez Rezende

Daniel Zoran

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Large Language Models can Implement Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Definition of Continual Reinforcement Learning.

[BibT_eX]

[DOI]

Hado Philip van Hasselt

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Structured State Space Models for In-Context Reinforcement Learning.

[BibT_eX]

[DOI]

Feryal M. P. Behbahani

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Human-Timescale Adaptation in an Open-Ended Task Space.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality.

[BibT_eX]

[DOI]

Tom Zahavy

Yannick Schroecker

Feryal M. P. Behbahani

Proceedings of the Eleventh International Conference on Learning Representations, 2023

In-context Reinforcement Learning with Algorithm Distillation.

[BibT_eX]

[DOI]

Steven Stenberg Hansen

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Composing Task Knowledge With Modular Successor Feature Approximators.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Discovering Evolution Strategies via Meta-Black-Box Optimization.

[BibT_eX]

[DOI]

Proceedings of the Companion Proceedings of the Conference on Genetic and Evolutionary Computation, 2023

2022

Figure Data for the paper "Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning".

[BibT_eX]

[DOI]

Dataset, October, 2022

In-Context Policy Iteration.

[BibT_eX]

[DOI]

CoRR, 2022

Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2022

GrASP: Gradient-Based Affordance Selection for Planning.

[BibT_eX]

[DOI]

CoRR, 2022

Palm up: Playing in the Latent Manifold for Unsupervised Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Approximate Value Equivalence.

[BibT_eX]

[DOI]

Christopher Grimm

André Barreto

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction.

[BibT_eX]

[DOI]

Dilip Arumugam

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Expressivity of Markov Reward (Extended Abstract).

[BibT_eX]

[DOI]

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Bootstrapped Meta-Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Meta-Gradients in Non-Stationary Environments.

[BibT_eX]

[DOI]

Proceedings of the Conference on Lifelong Learning Agents, 2022

Adaptive Pairwise Weights for Temporal Credit Assignment.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Learning to Learn End-to-End Goal-Oriented Dialog From Related Dialog Tasks.

[BibT_eX]

[DOI]

Jonathan K. Kummerfeld

CoRR, 2021

Discovering Diverse Nearly Optimal Policies withSuccessor Features.

[BibT_eX]

[DOI]

CoRR, 2021

Pairwise Weights for Temporal Credit Assignment.

[BibT_eX]

[DOI]

CoRR, 2021

Reward is enough.

[BibT_eX]

[DOI]

Artif. Intell., 2021

Learning State Representations from Random Deep Action-conditional Predictions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reward is enough for convex MDPs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Discovery of Options via Meta-Learned Subgoals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Proper Value Equivalence.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Expressivity of Markov Reward.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in a First-person Simulated 3D Environment.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Reinforcement Learning of Implicit and Explicit Control Flow Instructions.

[BibT_eX]

[DOI]

Ethan A. Brooks

Proceedings of the 38th International Conference on Machine Learning, 2021

Discovering a set of policies for the worst case reward.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Efficient Querying for Cooperative Probabilistic Commitments.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Reinforcement Learning for Sparse-Reward Object-Interaction Tasks in First-person Simulated 3D Environments.

[BibT_eX]

[DOI]

CoRR, 2020

Self-Tuning Deep Reinforcement Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Semantics and algorithms for trustworthy commitment achievement under model uncertainty.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., 2020

A Self-Tuning Actor-Critic Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Meta-Gradient Reinforcement Learning with an Objective Discovered Online.

[BibT_eX]

[DOI]

Zhongwen Xu

Hado Philip van Hasselt

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

On Efficiency in Hierarchical Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Discovering Reinforcement Learning Algorithms.

[BibT_eX]

[DOI]

Matteo Hessel

Wojciech M. Czarnecki

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

The Value Equivalence Principle for Model-Based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning to Play No-Press Diplomacy with Best Response Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

What Can Learned Intrinsic Rewards Capture?

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Behaviour Suite for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Querying to Find a Safe Policy under Uncertain Safety Constraints in Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

How Should an Agent Practice?

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Modeling Probabilistic Commitments for Maintenance Is Inherently Harder than for Achievement.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Online and Scalable Adaptive Cyber Defense.

[BibT_eX]

[DOI]

Benjamin W. Priest

George Cybenko

Massimiliano Albanese

Peng Liu

Proceedings of the Adversarial and Uncertain Reasoning for Adaptive Cyber Defense, 2019

Disentangled Cumulants Help Successor Representations Transfer to New Tasks.

[BibT_eX]

[DOI]

CoRR, 2019

Object-oriented state editing for HRL.

[BibT_eX]

[DOI]

Victor Bapst

Alvaro Sanchez-Gonzalez

Omar Shams

Kimberly L. Stachenfeld

Peter W. Battaglia

Jessica B. Hamrick

CoRR, 2019

Learning Independently-Obtainable Reward Functions.

[BibT_eX]

[DOI]

Christopher Grimm

CoRR, 2019

NE-Table: A Neural key-value table for Named Entities.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2019

Discovery of Useful Questions as Auxiliary Tasks.

[BibT_eX]

[DOI]

Vivek Veeriah

Matteo Hessel

Zhongwen Xu

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

No-Press Diplomacy: Modeling Multi-Agent Gameplay.

[BibT_eX]

[DOI]

Jonathan K. Kummerfeld

Joelle Pineau

Aaron C. Courville

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Hindsight Credit Assignment.

[BibT_eX]

[DOI]

Anna Harutyunyan

Will Dabney

Thomas Mesnard

Mohammad Gheshlaghi Azar

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Computational Strategies for the Trustworthy Pursuit and the Safe Modeling of Probabilistic Maintenance Commitments.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019

Deep Reinforcement Learning for Multi-driver Vehicle Dispatching and Repositioning Problem.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

Learning to Communicate and Solve Visual Blocks-World Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Multistage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis.

[BibT_eX]

[DOI]

Secur. Commun. Networks, 2018

Generative Adversarial Self-Imitation Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Many-Goals Reinforcement Learning.

[BibT_eX]

[DOI]

Vivek Veeriah

CoRR, 2018

Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities.

[BibT_eX]

[DOI]

CoRR, 2018

The Advantage of Doubling: A Deep Reinforcement Learning Approach to Studying the Double Team in the NBA.

[BibT_eX]

[DOI]

CoRR, 2018

On Learning Intrinsic Rewards for Policy Gradient Methods.

[BibT_eX]

[DOI]

Zeyu Zheng

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Completing State Representations using Spectral Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Minimax-Regret Querying on Side Effects for Safe Optimality in Factored Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Self-Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

Learning End-to-End Goal-Oriented Dialog with Multiple Answers.

[BibT_eX]

[DOI]

Jatin Ganhotra

Lazaros Polymenakos

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Challenges in the Trustworthy Pursuit of Maintenance Commitments Under Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the 20th International Trust Workshop co-located with AAMAS/IJCAI/ECAI/ICML 2018, 2018

On Querying for Safe Optimality in Factored Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

Markov Decision Processes with Continuous Side Information.

[BibT_eX]

[DOI]

Proceedings of the Algorithmic Learning Theory, 2018

2017

Value Prediction Network.

[BibT_eX]

[DOI]

Honglak Lee

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Repeated Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Kareem Amin

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Learning to Query, Reason, and Answer Questions On Ambiguous Texts.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

A Stackelberg Game Model for Botnet Data Exfiltration.

[BibT_eX]

[DOI]

Thanh Hong Nguyen

Proceedings of the Decision and Game Theory for Security - 8th International Conference, 2017

Predicting Counselor Behaviors in Motivational Interviewing Encounters.

[BibT_eX]

[DOI]

Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Multi-Stage Attack Graph Security Games: Heuristic Strategies, with Empirical Game-Theoretic Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2017 Workshop on Moving Target Defense, 2017

Minimizing Maximum Regret in Commitment Constrained Sequential Decision Making.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017

Approximately-Optimal Queries for Planning in Reward-Uncertain Markov Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Seventh International Conference on Automated Planning and Scheduling, 2017

Understanding and Predicting Empathic Behavior in Counseling Therapy.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

A Stackelberg Game Model for Botnet Traffic Exfiltration.

[BibT_eX]

[DOI]

Thanh Hong Nguyen

Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Multi-task seizure detection: addressing intra-patient variation in seizure morphologies.

[BibT_eX]

[DOI]

Alexander Van Esbroeck

Mach. Learn., 2016

Towards Resolving Unidentifiability in Inverse Reinforcement Learning.

[BibT_eX]

[DOI]

Kareem Amin

CoRR, 2016

Gradient Methods for Stackelberg Games.

[BibT_eX]

[DOI]

Kareem Amin

Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, 2016

Building a Motivational Interviewing Dataset.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2016

Commitment Semantics for Sequential Decision Making under Reward Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.

[BibT_eX]

[DOI]

Ambuj Tewari

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

The Dependence of Effective Planning Horizon on Model Accuracy.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Deep Learning for Reward Design to Improve Monte Carlo Tree Search in ATARI Games.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Control of Memory, Active Perception, and Action in Minecraft.

[BibT_eX]

[DOI]

Valliappa Chockalingam

Honglak Lee

Proceedings of the 33nd International Conference on Machine Learning, 2016

On the Trustworthy Fulfillment of Commitments.

[BibT_eX]

[DOI]

Proceedings of the 18th International Workshop on Trust in Agent Societies co-located with the 15th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2016), 2016

Improving Predictive State Representations via Gradient Descent.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Action-Conditional Video Prediction using Deep Networks in Atari Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Abstraction Selection in Model-based Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Machine Learning, 2015

Low-Rank Spectral Learning with Weighted Loss Functions.

[BibT_eX]

[DOI]

Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Commitment Semantics for Sequential Decision Making Under Reward Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015, 2015

Spectral Learning of Predictive State Representations with Insufficient Statistics.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Computational Rationality: Linking Mechanism and Behavior Through Bounded Utility Maximization.

[BibT_eX]

[DOI]

Andrew Howes

Top. Cogn. Sci., 2014

Utility Maximization and Bounds on Human Information Processing.

[BibT_eX]

[DOI]

Andrew Howes

Top. Cogn. Sci., 2014

Optimal Rewards for Cooperative Agents.

[BibT_eX]

[DOI]

IEEE Trans. Auton. Ment. Dev., 2014

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Ecologically valid long-term mood monitoring of individuals with bipolar disorder using speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Improving UCT planning via approximate homomorphisms.

[BibT_eX]

[DOI]

Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2014

Low-Rank Spectral Learning.

[BibT_eX]

[DOI]

N. Raj Rao

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

Characterizing EVOI-Sufficient k-Response Query Sets in Decision Problems.

[BibT_eX]

[DOI]

Robert Cohn

Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 2014

Computing Solutions in Infinite-Horizon Discounted Adversarial Patrolling Games.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Conference on Automated Planning and Scheduling, 2014

Computationally Rational Saccadic Control: An Explanation of Spillover Effects Based on Sampling from Noisy Perception and Memory.

[BibT_eX]

[DOI]

Michael Shvartsman

Proceedings of the Fifth Workshop on Cognitive Modeling and Computational Linguistics, 2014

Evaluating Trauma Patients: Addressing Missing Covariates with Joint Optimization.

[BibT_eX]

[DOI]

Alexander Van Esbroeck

Ilan Rubinfeld

Zeeshan Syed

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

Predicting Postoperative Atrial Fibrillation from Independent ECG Components.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2013

The Adaptive Nature of Eye Movements in Linguistic Tasks: How Payoff and Architecture Shape Speed-Accuracy Trade-Offs.

[BibT_eX]

[DOI]

Michael Shvartsman

Top. Cogn. Sci., 2013

Nash Convergence of Gradient Dynamics in Iterated General-Sum Games

[BibT_eX]

[DOI]

CoRR, 2013

Reward Mapping for Transfer in Long-Lived Agents.

[BibT_eX]

[DOI]

Xiaoxiao Guo

Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Linking Context to Evaluation in the Design of Safety Critical Interfaces.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction. Human-Centred Design Approaches, Methods, Tools, and Environments, 2013

2012

Knowledge Combination in Graphical Multiagent Model

[BibT_eX]

[DOI]

Quang Duong

CoRR, 2012

Reports of the AAAI 2011 Conference Workshops.

[BibT_eX]

[DOI]

AI Mag., 2012

Lossy stochastic game abstraction with bounds.

[BibT_eX]

[DOI]

Tuomas Sandholm

Proceedings of the 13th ACM Conference on Electronic Commerce, 2012

Optimal rewards in multiagent teams.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, 2012

Planning and evaluating multiagent influences under reward uncertainty.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Learning and predicting dynamic networked behavior with graphical multiagent models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Strong mitigation: nesting search for good policies within search for good reward.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, 2012

Security Games with Limited Surveillance: An Initial Report.

[BibT_eX]

[DOI]

Bo An

David Kempe

Proceedings of the Game Theory for Security, 2012

Computing Stackelberg Equilibria in Discounted Stochastic Games.

[BibT_eX]

[DOI]

Yevgeniy Vorobeychik

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

Security Games with Limited Surveillance.

[BibT_eX]

[DOI]

Bo An

David Kempe

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011

IP Geolocation in Metropolitan Areas.

[BibT_eX]

[DOI]

Satinder Pal Singh

PhD thesis, 2011

Learning to Make Predictions In Partially Observable Environments Without a Generative Model.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2011

Modeling Information Diffusion in Networks with Unobserved Links.

[BibT_eX]

[DOI]

Quang Duong

Proceedings of the PASSAT/SocialCom 2011, Privacy, 2011

IP geolocation in metropolitan areas.

[BibT_eX]

[DOI]

Proceedings of the SIGMETRICS 2011, 2011

Optimal Rewards versus Leaf-Evaluation Heuristics in Planning Agents.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

Comparing Action-Query Strategies in Semi-Autonomous Agents.

[BibT_eX]

[DOI]

Robert Cohn

Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010

Intrinsically Motivated Reinforcement Learning: An Evolutionary Perspective.

[BibT_eX]

[DOI]

IEEE Trans. Auton. Ment. Dev., 2010

Dynamic Incentive Mechanisms.

[BibT_eX]

[DOI]

AI Mag., 2010

Variance-Based Rewards for Approximate Bayesian Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the UAI 2010, 2010

Reward Design via Online Gradient Ascent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, 2010

Internal Rewards Mitigate Agent Boundedness.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010

Selecting Operator Queries Using Expected Myopic Gain.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2010

Linear options.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), 2010

History-dependent graphical multiagent models.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2010), 2010

2009

Maintaining Predictions over Time without a Model.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2009, 2009

Learning Graphical Game Models.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2009, 2009

Transfer via soft homomorphisms.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

SarsaLandmark: an algorithm for learning in POMDPs with landmarks.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2009), 2009

2008

Knowledge Combination in Graphical Multiagent Models.

[BibT_eX]

[DOI]

Quang Duong

Proceedings of the UAI 2008, 2008

Simple Local Models for Complex Dynamical Systems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Building Incomplete but Accurate Models.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Predictive Linear-Gaussian Models of Dynamical Systems with Vector-Valued Actions and Observations.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Artificial Intelligence and Mathematics, 2008

Efficiently learning linear-linear exponential family predictive representations of state.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2008

Approximate predictive state representations.

[BibT_eX]

[DOI]

Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2008), 2008

2007

Learning payoff functions in infinite games.

[BibT_eX]

[DOI]

Yevgeniy Vorobeychik

Mach. Learn., 2007

DaNaLIX: a domain-adaptive natural language interface for querying XML.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGMOD International Conference on Management of Data, 2007

Exponential Family Predictive Representations of State.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 20, 2007

Relational Knowledge with Predictive State Representations.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2007, 2007

An Experts Algorithm for Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2007, 2007

On discovery and learning of models with predictive representations of state for agents with continuous actions and observations.

[BibT_eX]

[DOI]

Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), 2007

Constraint satisfaction algorithms for graphical games.

[BibT_eX]

[DOI]

Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), 2007

Abstraction in Predictive State Representations.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

Enabling Domain-Awareness for a Generic Natural Language Interface.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

2006

Cobot in LambdaMOO: An Adaptive Social Statistics Agent.

[BibT_eX]

[DOI]

Auton. Agents Multi Agent Syst., 2006

Optimal Coordinated Planning Amongst Self-Interested Agents with Private State.

[BibT_eX]

[DOI]

Ruggiero Cavallo

David C. Parkes

Proceedings of the UAI '06, 2006

Predictive state representations with options.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2006

Kernel Predictive Linear Gaussian models for nonlinear stochastic dynamical systems.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2006

Predictive linear-Gaussian models of controlled stochastic dynamical systems.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2006

Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2006

Using Homomorphisms to Transfer Options across Continuous Reinforcement Learning Domains.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2006

2005

Strategic Interactions in a Supply Chain Game.

[BibT_eX]

[DOI]

Comput. Intell., 2005

Reports on the 2004 AAAI Fall Symposia.

[BibT_eX]

[DOI]

Nicholas L. Cassimatis

AI Mag., 2005

Predictive Linear-Gaussian Models of Stochastic Dynamical Systems.

[BibT_eX]

[DOI]

Proceedings of the UAI '05, 2005

Off-policy Learning with Options and Recognizers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Combining Memory and Landmarks with Predictive State Representations.

[BibT_eX]

[DOI]

Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

Learning predictive state representations in dynamical systems without reset.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2005

Planning in Models that Combine Memory with Predictive Representations of State.

[BibT_eX]

[DOI]

Proceedings of the Proceedings, 2005

2004

Value-driven procurement in the TAC supply chain game.

[BibT_eX]

[DOI]

SIGecom Exch., 2004

Predictive State Representations: A New Theory for Modeling Dynamical Systems.

[BibT_eX]

[DOI]

Proceedings of the UAI '04, 2004

Computing approximate bayes-nash equilibria in tree-games of incomplete information.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 5th ACM Conference on Electronic Commerce (EC-2004), 2004

Intrinsically Motivated Reinforcement Learning.

[BibT_eX]

[DOI]

Andrew G. Barto

Nuttapong Chentanez

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Approximately Efficient Online Mechanism Design.

[BibT_eX]

[DOI]

David C. Parkes

Dimah Yanovsky

Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Planning with predictive state representations.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Conference on Machine Learning and Applications, 2004

Adaptive cognitive orthotics: combining reinforcement learning and constraint-based temporal reasoning.

[BibT_eX]

[DOI]

Martha E. Pollack

Proceedings of the Machine Learning, 2004

Learning and discovery of predictive state representations in dynamical systems with reset.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2004

Strategic Interactions in the TAC 2003 Supply Chain Tournament.

[BibT_eX]

[DOI]

Proceedings of the Computers and Games, 4th International Conference, 2004

Distributed Feedback Control for Decision Making on Supply Chains.

[BibT_eX]

[DOI]

Proceedings of the Fourteenth International Conference on Automated Planning and Scheduling (ICAPS 2004), 2004

2003

A Nonlinear Predictive State Representation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

An MDP-Based Approach to Online Mechanism Design.

[BibT_eX]

[DOI]

David C. Parkes

Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Learning Predictive State Representations.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2003

2002

Introduction.

[BibT_eX]

[DOI]

Mach. Learn., 2002

Near-Optimal Reinforcement Learning in Polynomial Time.

[BibT_eX]

[DOI]

Mach. Learn., 2002

Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2002

CobotDS: A Spoken Dialogue System for Chat.

[BibT_eX]

[DOI]

Diane J. Litman

Jessica Howe

Proceedings of the Eighteenth National Conference on Artificial Intelligence and Fourteenth Conference on Innovative Applications of Artificial Intelligence, July 28, 2002

2001

ATTac-2000: An Adaptive Autonomous Bidding Agent.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2001

FAucS : An FCC Spectrum Auction Simulator for Autonomous Bidding Agents.

[BibT_eX]

[DOI]

Proceedings of the Electronic Commerce, Second International Workshop, 2001

Graphical Models for Game Theory.

[BibT_eX]

[DOI]

Proceedings of the UAI '01: Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, 2001

Predictive Representations of State.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

Cobot: A Social Reinforcement Learning Agent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001

A social reinforcement learning agent.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Autonomous Agents, 2001

2000

Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms.

[BibT_eX]

[DOI]

Mach. Learn., 2000

Nash Convergence of Gradient Dynamics in General-Sum Games.

[BibT_eX]

[DOI]

Proceedings of the UAI '00: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, California, USA, June 30, 2000

Fast Planning in Stochastic Games.

[BibT_eX]

[DOI]

Proceedings of the UAI '00: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, Stanford University, Stanford, California, USA, June 30, 2000

Reinforcement Learning for 3 vs. 2 Keepaway

[BibT_eX]

[DOI]

Peter Stone

Proceedings of the RoboCup 2000: Robot Soccer World Cup IV, 2000

Eligibility Traces for Off-Policy Policy Evaluation.

[BibT_eX]

Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

A Boosting Approach to Topic Spotting on Subdialogues.

[BibT_eX]

Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29, 2000

Bias-Variance Error Bounds for Temporal Difference Updates.

[BibT_eX]

Proceedings of the Thirteenth Annual Conference on Computational Learning Theory (COLT 2000), June 28, 2000

Automatic Optimization of Dialogue Management.

[BibT_eX]

[DOI]

Proceedings of the COLING 2000, 18th International Conference on Computational Linguistics, Proceedings of the Conference, 2 Volumes, July 31, 2000

Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System.

[BibT_eX]

[DOI]

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, July 30, 2000

Cobot in LambdaMOO: A Social Statistics Agent.

[BibT_eX]

[DOI]

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on on Innovative Applications of Artificial Intelligence, July 30, 2000

1999

Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning.

[BibT_eX]

[DOI]

Artif. Intell., 1999

Approximate Planning for Factored POMDPs using Belief State Simplification.

[BibT_eX]

[DOI]

David A. McAllester

Proceedings of the UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30, 1999

On the Complexity of Policy Iteration.

[BibT_eX]

[DOI]

Proceedings of the UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30, 1999

Policy Gradient Methods for Reinforcement Learning with Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

Reinforcement Learning for Spoken Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 12, [NIPS Conference, Denver, Colorado, USA, November 29, 1999

1998

Analytical Mean Squared Error Curves for Temporal Difference Learning.

[BibT_eX]

[DOI]

Peter Dayan

Mach. Learn., 1998

Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes.

[BibT_eX]

[DOI]

John K. Williams

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Improved Switching among Temporally Abstract Actions.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Optimizing Admission Control while Ensuring Quality of Service in Multimedia Networks via Reinforcement Learning.

[BibT_eX]

[DOI]

Timothy X. Brown

Hui Tong

Proceedings of the Advances in Neural Information Processing Systems 11, [NIPS Conference, Denver, Colorado, USA, November 30, 1998

Intra-Option Learning about Temporally Abstract Actions.

[BibT_eX]

Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes.

[BibT_eX]

John Loch

Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Near-Optimal Reinforcement Learning in Polynominal Time.

[BibT_eX]

Proceedings of the Fifteenth International Conference on Machine Learning (ICML 1998), 1998

Theoretical Results on Reinforcement Learning with Temporally Abstract Options.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning: ECML-98, 1998

1997

How to Dynamically Merge Markov Decision Processes.

[BibT_eX]

[DOI]

David Cohn

Proceedings of the Advances in Neural Information Processing Systems 10, 1997

1996

Reinforcement Learning with Replacing Eligibility Traces.

[BibT_eX]

[DOI]

Mach. Learn., 1996

Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems.

[BibT_eX]

[DOI]

Dimitri P. Bertsekas

Proceedings of the Advances in Neural Information Processing Systems 9, 1996

Predicting Lifetimes in Dynamically Allocated Memory.

[BibT_eX]

[DOI]

David A. Cohn

Proceedings of the Advances in Neural Information Processing Systems 9, 1996

Learning Curve Bounds for a Markov Decision Process with Undiscounted Rewards.

[BibT_eX]

[DOI]

Lawrence K. Saul

Proceedings of the Ninth Annual Conference on Computational Learning Theory, 1996

1995

Learning to Act Using Real-Time Dynamic Programming.

[BibT_eX]

[DOI]

Andrew G. Barto

Steven J. Bradtke

Artif. Intell., 1995

Improving Policies without Measuring Merits.

[BibT_eX]

[DOI]

Peter Dayan

Proceedings of the Advances in Neural Information Processing Systems 8, 1995

Markov Decision Processes in Large State Spaces.

[BibT_eX]

[DOI]

Lawrence K. Saul

Proceedings of the Eigth Annual Conference on Computational Learning Theory, 1995

1994

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms.

[BibT_eX]

[DOI]

Neural Comput., 1994

An Upper Bound on the Loss from Approximate Optimal-Value Functions.

[BibT_eX]

[DOI]

Richard C. Yee

Mach. Learn., 1994

Reinforcement Learning with Soft State Aggregation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 7, 1994

Learning Without State-Estimation in Partially Observable Markovian Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 1994

Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes.

[BibT_eX]

[DOI]

Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA, USA, July 31, 1994

1993

Robust Reinforcement Learning in Motion Planning.

[BibT_eX]

[DOI]

Andrew G. Barto

Roderic A. Grupen

Christopher I. Connolly

Proceedings of the Advances in Neural Information Processing Systems 6, 1993

1992

Transfer of Learning by Composing Solutions of Elemental Sequential Tasks.

[BibT_eX]

[DOI]

Satinder Pal Singh

Mach. Learn., 1992

Scaling Reinforcement Learning Algorithms by Learning Variable Temporal Resolution Models.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Workshop on Machine Learning (ML 1992), 1992

Reinforcement Learning with a Hierarchy of Abstract Models.

[BibT_eX]

[DOI]

Proceedings of the 10th National Conference on Artificial Intelligence, 1992

1991

The Efficient Learning of Multiple Task Sequences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 4, 1991

A Cortico-Cerebellar Model that Learns to Generate Distributed Motor Commands to Control a Kinematic Arm.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 4, 1991

Transfer of Learning Across Compositions of Sequentail Tasks.

[BibT_eX]

[DOI]