2025
Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models.
CoRR, January, 2025
2024
Formal contracts mitigate social dilemmas in multi-agent reinforcement learning.
Auton. Agents Multi Agent Syst., December, 2024
Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Trans. Recomm. Syst., September, 2024
Goal Inference from Open-Ended Dialog.
CoRR, 2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Defending Against Unforeseen Failure Modes with Latent Adversarial Training.
CoRR, 2024
Eight Methods to Evaluate Robust Unlearning in LLMs.
CoRR, 2024
Melting Pot Contest: Charting the Future of Generalized Cooperative Intelligence.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Black-Box Access is Insufficient for Rigorous AI Audits.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024
2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Trans. Mach. Learn. Res., 2023
Measuring the Success of Diffusion Models at Imitating Human Artists.
CoRR, 2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch.
CoRR, 2023
Benchmarking Interpretability Tools for Deep Neural Networks.
CoRR, 2023
Recommending to Strategic Users.
CoRR, 2023
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks.
Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning, 2023
Red Teaming Deep Neural Networks with Feature Synthesis Tools.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023
White-Box Adversarial Policies in Deep Reinforcement Learning.
Proceedings of the Workshop on Artificial Intelligence Safety 2023 (SafeAI 2023) co-located with the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), 2023
2022
Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks.
CoRR, 2022
Building Human Values into Recommender Systems: An Interdisciplinary Synthesis.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
How to talk so your robot will learn: Instructions, descriptions, and pragmatics.
CoRR, 2022
Linguistic communication as (inverse) reward design.
CoRR, 2022
Towards Psychologically-Grounded Dynamic Preference Models.
Proceedings of the RecSys '22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18, 2022
How to talk so AI will learn: Instructions, descriptions, and autonomy.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Robust Feature-Level Adversaries are Interpretability Tools.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Estimating and Penalizing Induced Preference Shifts in Recommender Systems.
Proceedings of the International Conference on Machine Learning, 2022
A Penalty Default Approach to Preemptive Harm Disclosure and Mitigation for AI Systems.
Proceedings of the AIES '22: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, May 19, 2022
2021
When Curation Becomes Creation: Algorithms, microcontent, and the vanishing distinction between platforms and creators.
ACM Queue, 2021
What are you optimizing for? Aligning Recommender Systems with Human Values.
CoRR, 2021
When curation becomes creation.
Commun. ACM, 2021
Estimating and Penalizing Preference Shift in Recommender Systems.
Proceedings of the RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021, 2021
Guided Imitation of Task and Motion Planning.
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021
2020
Multi-Principal Assistance Games: Definition and Collegial Mechanisms.
CoRR, 2020
Multi-Principal Assistance Games.
CoRR, 2020
Consequences of Misaligned AI.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Silly Rules Improve the Capacity of Agents to Learn Stable Enforcement and Compliance Behaviors.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020
Conservative Agency via Attainable Utility Preservation.
Proceedings of the AIES '20: AAAI/ACM Conference on AI, 2020
2019
An Extensible Interactive Interface for Agent Design.
CoRR, 2019
Adversarial Training with Voronoi Constraints.
CoRR, 2019
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019
On the Utility of Model Learning in HRI.
Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019
The Assistive Multi-Armed Bandit.
Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019
Human-AI Learning Performance in Multi-Armed Bandits.
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
Incomplete Contracting and AI Alignment.
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
Legible Normativity for AI Alignment: The Value of Silly Rules.
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
2018
On the Geometry of Adversarial Examples.
CoRR, 2018
Active Inverse Reward Design.
CoRR, 2018
Simplifying Reward Design through Divide-and-Conquer.
Proceedings of the Robotics: Science and Systems XIV, 2018
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018
2017
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Pragmatic-Pedagogic Value Alignment.
Proceedings of the Robotics Research, The 18th International Symposium, 2017
Should Robots be Obedient?
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Expressive Robot Motion Timing.
Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, 2017
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Cooperative Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Sequential quadratic programming for task plan optimization.
Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016
Guided search for task and motion plans using learned heuristics.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016
2015
Multitasking: Optimal Planning for Bandit Superprocesses.
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015
Modular task and motion planning in belief space.
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015
Beyond lowest-warping cost action selection in trajectory transfer.
Proceedings of the IEEE International Conference on Robotics and Automation, 2015
2014
Unifying scene registration and trajectory optimization for learning from demonstrations with application to manipulation of deformable objects.
Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014
2013
Optimization in the now: Dynamic peephole optimization for hierarchical planning.
Proceedings of the 2013 IEEE International Conference on Robotics and Automation, 2013