Dylan Hadfield-Menell
Orcid: 0000-0002-6168-4763Affiliations:
- University of California, Berkeley, USA
According to our database1,
Dylan Hadfield-Menell
authored at least 61 papers
between 2013 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Trans. Recomm. Syst., September, 2024
Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024
The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability.
CoRR, 2024
CoRR, 2024
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024
2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks.
Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations of Truthfulness?
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023
Proceedings of the Workshop on Artificial Intelligence Safety 2023 (SafeAI 2023) co-located with the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023), 2023
2022
CoRR, 2022
CoRR, 2022
Proceedings of the RecSys '22: Sixteenth ACM Conference on Recommender Systems, Seattle, WA, USA, September 18, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the International Conference on Machine Learning, 2022
A Penalty Default Approach to Preemptive Harm Disclosure and Mitigation for AI Systems.
Proceedings of the AIES '22: AAAI/ACM Conference on AI, Ethics, and Society, Oxford, United Kingdom, May 19, 2022
2021
When Curation Becomes Creation: Algorithms, microcontent, and the vanishing distinction between platforms and creators.
ACM Queue, 2021
CoRR, 2021
Proceedings of the RecSys '21: Fifteenth ACM Conference on Recommender Systems, Amsterdam, The Netherlands, 27 September 2021, 2021
Proceedings of the Conference on Robot Learning, 8-11 November 2021, London, UK., 2021
2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Silly Rules Improve the Capacity of Agents to Learn Stable Enforcement and Compliance Behaviors.
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, 2020
Proceedings of the AIES '20: AAAI/ACM Conference on AI, 2020
2019
Proceedings of the Workshop on Artificial Intelligence Safety 2019 co-located with the 28th International Joint Conference on Artificial Intelligence, 2019
Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019
Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction, 2019
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 2019
2018
Proceedings of the Robotics: Science and Systems XIV, 2018
An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018
2017
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017
Proceedings of the Robotics Research, The 18th International Symposium, 2017
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction, 2017
Proceedings of the Workshops of the The Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016
Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016
2015
Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, 2015
Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2015
Proceedings of the IEEE International Conference on Robotics and Automation, 2015
2014
Unifying scene registration and trajectory optimization for learning from demonstrations with application to manipulation of deformable objects.
Proceedings of the 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014
2013
Proceedings of the 2013 IEEE International Conference on Robotics and Automation, 2013