2025

Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints.

[DOI]

Yaswanth Chittepu

,

Blossom Metevier

,

,

,

,

Philip S. Thomas

CoRR, June, 2025

A Descriptive and Normative Theory of Human Beliefs in RLHF.

[DOI]

,

Shripad Deshmukh

,

,

W. Bradley Knox

,

CoRR, June, 2025

Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation.

[DOI]

Tuhina Tripathi

,

,

,

CoRR, April, 2025

Fast Adaptation with Behavioral Foundation Models.

[DOI]

,

Andrea Tirinzoni

,

,

,

Anssi Kanervisto

,

,

,

Alessandro Lazaric

,

CoRR, April, 2025

Supervised Reward Inference.

[DOI]

,

Jordan Schneider

,

Philip S. Thomas

,

CoRR, February, 2025

Influencing Humans to Conform to Preference Models for RLHF.

[DOI]

Stephane Hatgis-Kessell

,

W. Bradley Knox

,

,

,

CoRR, January, 2025

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning.

[DOI]

,

,

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning.

[DOI]

,

,

,

,

Siddhant Agarwal

,

,

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Models of human preference for learning reward functions.

[DOI]

W. Bradley Knox

,

Stephane Hatgis-Kessell

,

,

,

,

Alessandro Gabriele Allievi

Trans. Mach. Learn. Res., 2024

Granger Causal Interaction Skill Chains.

[DOI]

,

,

,

,

Trans. Mach. Learn. Res., 2024

RL Zero: Zero-Shot Language to Behaviors without any Supervision.

[DOI]

,

Siddhant Agarwal

,

,

Samyak Parajuli

,

,

,

,

,

CoRR, 2024

Pareto-Optimal Learning from Preferences with Hidden Context.

[DOI]

,

,

,

CoRR, 2024

Robot Air Hockey: A Manipulation Testbed for Robot Learning with Reinforcement Learning.

[DOI]

,

,

Michael J. Munje

,

,

,

,

Siddhant Agarwal

,

,

,

,

,

,

,

,

,

CoRR, 2024

D2PO: Discriminator-Guided DPO with Response Evaluation Models.

[DOI]

Prasann Singhal

,

,

,

,

CoRR, 2024

Automated Discovery of Functional Actual Causes in Complex Environments.

[DOI]

,

Sankaran Vaidyanathan

,

Stephen Giguere

,

,

,

CoRR, 2024

Learning Action-based Representations Using Invariance.

[DOI]

,

,

,

,

,

RLJ, 2024

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions.

[DOI]

,

,

,

,

Roberto Martín-Martín

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms.

[DOI]

Rafael Rafailov

,

Yaswanth Chittepu

,

,

,

,

W. Bradley Knox

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Predicting Future Actions of Reinforcement Learning Agents.

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Gaze Supervision for Mitigating Causal Confusion in Driving Agents.

[DOI]

,

Badal Arun Pardhi

,

,

,

,

,

Alessandro Allievi

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning.

[DOI]

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Score Models for Offline Goal-Conditioned Reinforcement Learning.

[DOI]

,

,

,

Alborz Geramifard

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Contrastive Preference Learning: Learning from Human Feedback without Reinforcement Learning.

[DOI]

,

Rafael Rafailov

,

,

,

,

W. Bradley Knox

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

A Dual Approach to Imitation Learning from Observations with Offline Datasets.

[DOI]

,

,

,

Proceedings of the Conference on Robot Learning, 6-9 November 2024, Munich, Germany., 2024

Learning Optimal Advantage from Preferences and Mistaking It for Reward.

[DOI]

W. Bradley Knox

,

Stephane Hatgis-Kessell

,

Sigurdur O. Adalgeirsson

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

A Ranking Game for Imitation Learning.

[DOI]

,

,

,

Trans. Mach. Learn. Res., 2023

Contrastive Preference Learning: Learning from Human Feedback without RL.

[DOI]

,

Rafael Rafailov

,

,

,

,

W. Bradley Knox

,

CoRR, 2023

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning.

[DOI]

,

Sreehari Rammohan

,

Alessandro Allievi

,

,

George Konidaris

CoRR, 2023

Granger-Causal Hierarchical Skill Discovery.

[DOI]

,

,

,

,

CoRR, 2023

Imitation from Arbitrary Experience: A Dual Unification of Reinforcement and Imitation Learning Methods.

[DOI]

,

,

CoRR, 2023

Language-guided Task Adaptation for Imitation Learning.

[DOI]

,

Raymond J. Mooney

,

CoRR, 2023

The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications.

[DOI]

,

W. Bradley Knox

,

,

,

,

Alessandro Allievi

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Know Your Boundaries: The Necessity of Explicit Behavioral Cloning in Offline RL.

[DOI]

,

CoRR, 2022

Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?

[DOI]

,

,

,

,

Aravind Rajeswaran

Proceedings of the Learning for Dynamics and Control Conference, 2022

Understanding Acoustic Patterns of Human Teachers Demonstrating Manipulation Tasks to Robots.

[DOI]

,

,

,

Rudolf Lioutikov

,

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Fairness Guarantees under Demographic Shift.

[DOI]

Stephen Giguere

,

Blossom Metevier

,

Bruno Castro da Silva

,

,

Philip S. Thomas

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Importance sampling in reinforcement learning with an estimated behavior policy.

[DOI]

Josiah P. Hanna

,

,

Mach. Learn., 2021

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms.

[DOI]

,

,

George Konidaris

J. Mach. Learn. Res., 2021

Robust Generative Adversarial Imitation Learning via Local Lipschitzness.

[DOI]

Farzan Memarian

,

Abolfazl Hashemi

,

,

CoRR, 2021

Zero-shot Task Adaptation using Natural Language.

[DOI]

,

Raymond J. Mooney

,

CoRR, 2021

SOPE: Spectrum of Off-Policy Estimators.

[DOI]

Christina J. Yuan

,

,

Stephen Giguere

,

Philip S. Thomas

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Adversarial Intrinsic Motivation for Reinforcement Learning.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Universal Off-Policy Evaluation.

[DOI]

,

,

Bruno C. da Silva

,

Erik G. Learned-Miller

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Online Reward Shaping in Sparse-Reward Environments.

[DOI]

Farzan Memarian

,

,

Rudolf Lioutikov

,

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Understanding the Relationship between Interactions and Outcomes in Human-in-the-Loop Machine Learning.

[DOI]

,

,

,

,

Reid G. Simmons

,

Aaron Steinfeld

,

Tesca Fitzgerald

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory.

[DOI]

,

Rudolf Lioutikov

,

,

Proceedings of the IEEE International Conference on Robotics and Automation, 2021

Value Alignment Verification.

[DOI]

Daniel S. Brown

,

Jordan Schneider

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

SCAPE: Learning Stiffness Control from Augmented Position Control Experiences.

[DOI]

,

,

Ashish D. Deshpande

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

Distributional Depth-Based Estimation of Object Articulation Models.

[DOI]

,

Stephen Giguere

,

Rudolf Lioutikov

,

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

You Only Evaluate Once: a Simple Baseline Algorithm for Offline RL.

[DOI]

,

Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021

Efficiently Guiding Imitation Learning Agents with Human Gaze.

[DOI]

,

,

Elaine Schaertl Short

,

Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

Demonstration of the EMPATHIC Framework for Task Learning from Implicit Human Feedback.

[DOI]

,

,

,

Alessandro Allievi

,

,

,

W. Bradley Knox

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Value Alignment Verification.

[DOI]

Daniel S. Brown

,

Jordan Schneider

,

CoRR, 2020

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory.

[DOI]

,

Rudolf Lioutikov

,

CoRR, 2020

Efficiently Guiding Imitation Learning Algorithms with Human Gaze.

[DOI]

,

,

Elaine Schaertl Short

,

CoRR, 2020

Local Nonparametric Meta-Learning.

[DOI]

,

CoRR, 2020

Bayesian Robust Optimization for Imitation Learning.

[DOI]

Daniel S. Brown

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Learning Hybrid Object Kinematics for Efficient Hierarchical Planning Under Uncertainty.

[DOI]

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning.

[DOI]

,

Supawit Chockchowwat

,

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Human Gaze Assisted Artificial Intelligence: A Review.

[DOI]

,

,

,

,

,

,

Dana H. Ballard

,

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences.

[DOI]

Daniel S. Brown

,

Russell Coleman

,

Ravi Srinivasan

,

Proceedings of the 37th International Conference on Machine Learning, 2020

PixL2R: Guiding Reinforcement Learning Using Natural Language by Mapping Pixels to Rewards.

[DOI]

,

,

Raymond J. Mooney

Proceedings of the 4th Conference on Robot Learning, 2020

The EMPATHIC Framework for Task Learning from Implicit Human Feedback.

[DOI]

,

,

W. Bradley Knox

,

Alessandro Allievi

,

,

Proceedings of the 4th Conference on Robot Learning, 2020

2019

Deep Bayesian Reward Learning from Preferences.

[DOI]

Daniel S. Brown

,

CoRR, 2019

Ranking-Based Reward Extrapolation without Rankings.

[DOI]

Daniel S. Brown

,

,

CoRR, 2019

Using Natural Language for Reward Shaping in Reinforcement Learning.

[DOI]

,

,

Raymond J. Mooney

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

One-Shot Learning of Multi-Step Tasks from Observation via Activity Localization in Auxiliary Video.

[DOI]

,

Proceedings of the International Conference on Robotics and Automation, 2019

Uncertainty-Aware Data Aggregation for Deep Imitation Learning.

[DOI]

,

,

,

Proceedings of the International Conference on Robotics and Automation, 2019

Importance Sampling Policy Evaluation with an Estimated Behavior Policy.

[DOI]

,

,

Proceedings of the 36th International Conference on Machine Learning, 2019

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations.

[DOI]

Daniel S. Brown

,

,

Prabhat Nagarajan

,

Proceedings of the 36th International Conference on Machine Learning, 2019

Enhancing Robot Learning with Human Social Cues.

[DOI]

,

Elaine Schaertl Short

,

,

Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019

Learning from Corrective Demonstrations.

[DOI]

Reymundo A. Gutierrez

,

Elaine Schaertl Short

,

,

Andrea Lockerd Thomaz

Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019

Understanding Teacher Gaze Patterns for Robot Learning.

[DOI]

,

Elaine Schaertl Short

,

,

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

Better-than-Demonstrator Imitation Learning via Automatically-Ranked Demonstrations.

[DOI]

Daniel S. Brown

,

,

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications.

[DOI]

Daniel S. Brown

,

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

LAAIR: A Layered Architecture for Autonomous Interactive Robots.

[DOI]

,

,

,

Nicolas Brissonneau

,

Daniel S. Brown

,

,

,

,

CoRR, 2018

Towards Online Learning from Corrective Demonstrations.

[DOI]

Reymundo A. Gutierrez

,

Elaine Schaertl Short

,

,

Andrea Lockerd Thomaz

CoRR, 2018

Learning Multi-Step Robotic Tasks from Observation.

[DOI]

,

CoRR, 2018

Human Gaze Following for Human-Robot Interaction.

[DOI]

,

Srinjoy Majumdar

,

Elaine Schaertl Short

,

,

Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018

Incremental Task Modification via Corrective Demonstrations.

[DOI]

Reymundo A. Gutierrez

,

,

Andrea Lockerd Thomaz

,

Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Active Reward Learning from Critiques.

[DOI]

,

Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

Asking for Help Effectively via Modeling of Human Beliefs.

[DOI]

Taylor Kessler Faulkner

,

,

Proceedings of the Companion of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, 2018

Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics.

[DOI]

,

Proceedings of the 2nd Annual Conference on Robot Learning, 2018

Risk-Aware Active Inverse Reinforcement Learning.

[DOI]

Daniel S. Brown

,

,

Proceedings of the 2nd Annual Conference on Robot Learning, 2018

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning.

[DOI]

Daniel S. Brown

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Safe Reinforcement Learning via Shielding.

[DOI]

Mohammed Alshiekh

,

,

Rüdiger Ehlers

,

Bettina Könighofer

,

,

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Viewpoint selection for visual failure detection.

[DOI]

,

,

Srinjoy Majumdar

,

,

Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017

Classification error correction: A case study in brain-computer interfacing.

[DOI]

Hasan A. Poonawala

,

Mohammed Alshiekh

,

,

Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017

Data-Efficient Policy Evaluation Through Behavior Policy Search.

[DOI]

Josiah P. Hanna

,

Philip S. Thomas

,

,

Proceedings of the 34th International Conference on Machine Learning, 2017

Toward Probabilistic Safety Bounds for Robot Learning from Demonstration.

[DOI]

Daniel S. Brown

,

Proceedings of the 2017 AAAI Fall Symposia, Arlington, Virginia, USA, November 9-11, 2017, 2017

Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation.

[DOI]

Josiah P. Hanna

,

,

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

High Confidence Off-Policy Evaluation with Models.

[DOI]

Josiah P. Hanna

,

,

CoRR, 2016

On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search.

[DOI]

Piyush Khandelwal

,

,

,

Proceedings of the 33nd International Conference on Machine Learning, 2016

2015

Learning grounded finite-state representations from unstructured demonstrations.

[DOI]

,

Sarah Osentoski

,

George Dimitri Konidaris

,

,

Bhaskara Marthi

,

Andrew G. Barto

Int. J. Robotics Res., 2015

Policy Evaluation Using the Ω-Return.

[DOI]

Philip S. Thomas

,

,

Georgios Theocharous

,

George Dimitri Konidaris

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Online Bayesian changepoint detection for articulated motion models.

[DOI]

,

Sarah Osentoski

,

Christopher G. Atkeson

,

Andrew G. Barto

Proceedings of the IEEE International Conference on Robotics and Automation, 2015

Active articulation model estimation through interactive perception.

[DOI]

,

,

Sarah Osentoski

,

Gaurav S. Sukhatme

Proceedings of the IEEE International Conference on Robotics and Automation, 2015

2014

Learning pouring skills from demonstration and practice.

[DOI]

Akihiko Yamaguchi

,

Christopher G. Atkeson

,

,

Tsukasa Ogasawara

Proceedings of the 14th IEEE-RAS International Conference on Humanoid Robots, 2014

2013

Incremental Semantically Grounded Learning from Demonstration.

[DOI]

,

,

Andrew G. Barto

,

Bhaskara Marthi

,

Sarah Osentoski

Proceedings of the Robotics: Science and Systems IX, Technische Universität Berlin, Berlin, Germany, June 24, 2013

An Integrated System for Learning Multi-Step Robotic Tasks from Unstructured Demonstrations.

[DOI]

Proceedings of the Designing Intelligent Robots: Reintegrating AI II, 2013

2012

Learning and generalization of complex tasks from unstructured demonstrations.

[DOI]

,

Sarah Osentoski

,

George Dimitri Konidaris

,

Andrew G. Barto

Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

Complex Task Learning from Unstructured Demonstrations.

[DOI]

Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, 2012

2011

TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning.

[DOI]

George Dimitri Konidaris

,

,

Philip S. Thomas

Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Evolution of reward functions for reinforcement learning.

[DOI]

,

,

Andrew G. Barto

Proceedings of the 13th Annual Genetic and Evolutionary Computation Conference, 2011

Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery.

[DOI]

,

Andrew G. Barto

Proceedings of the Lifelong Learning, 2011

2010

Genetic Programming for Reward Function Search.

[DOI]

,

Andrew G. Barto

,

IEEE Trans. Auton. Ment. Dev., 2010

Evolved Intrinsic Reward Functions for Reinforcement Learning.

[DOI]

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010