2024

Operationalizing Contextual Integrity in Privacy-Conscious Assistants.

[DOI]

Sahra Ghalebikesabi

,

Eugene Bagdasaryan

,

,

,

,

,

,

Laura Weidinger

,

Robert Stanforth

,

Leonard Berrada

,

,

,

CoRR, 2024

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach.

[DOI]

,

,

,

,

,

Sara Wiltberger

,

Shubham Milind Phal

,

Katherine L. Hermann

,

Daniel Kasenberg

,

Avishkar Bhoopchand

,

,

,

,

,

,

Parsa Mahmoudieh

,

,

,

,

Brett Wiltshire

,

,

,

Jasmin Rubinovitz

,

,

,

Julia Wilkowski

,

,

,

,

,

,

,

,

,

,

Arslan Chaudhry

,

,

Sridhar Thiagarajan

,

,

,

,

,

Rachel Hashimshoni

,

Laura Weidinger

,

,

,

,

,

Maxwell L. Bileschi

,

,

,

,

Kelsie Van Deman

,

Hema Bajaj Misra

,

,

,

,

,

Christopher Summerfield

,

,

Pierre-Alexandre Kamienny

,

,

Theofilos Strinopoulos

,

,

,

,

,

,

,

Maureen Heymans

,

Zoubin Ghahramani

,

,

,

CoRR, 2024

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources.

[DOI]

CoRR, 2024

The Ethics of Advanced AI Assistants.

[DOI]

,

Arianna Manzini

,

,

Lisa Anne Hendricks

,

,

,

,

,

,

Mikel Rodriguez

,

Seliem El-Sayed

,

,

,

,

,

A. Stevie Bergman

,

,

,

,

Juan Mateos-Garcia

,

Laura Weidinger

,

,

,

,

,

,

,

Victoria Krakovna

,

John Oliver Siy

,

Zeb Kurth-Nelson

,

Amanda McCroskery

,

,

,

Murray Shanahan

,

,

,

,

Yetunde Ibitoye

,

,

,

Sébastien Krier

,

Alexander Reese

,

Sims Witherspoon

,

,

,

,

Matija Franklin

,

Josh A. Goldstein

,

,

,

,

,

Meredith Ringel Morris

,

,

Blaise Agüera y Arcas

,

,

CoRR, 2024

Holistic Safety and Responsibility Evaluations of Advanced AI Models.

[DOI]

Laura Weidinger

,

Joslyn Barnhart

,

,

Christina Butterfield

,

,

,

Lisa Anne Hendricks

,

Ramona Comanescu

,

,

Mikel Rodriguez

,

Jennifer Beroshi

,

,

,

,

Sebastian Farquhar

,

,

,

,

CoRR, 2024

STAR: SocioTechnical Approach to Red Teaming Language Models.

[DOI]

Laura Weidinger

,

,

Bernat Guillen Pegueroles

,

,

,

,

,

,

A. Stevie Bergman

,

Mikel Rodriguez

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Gaps in the Safety Evaluation of Generative AI.

[DOI]

,

,

Arianna Manzini

,

Lisa Anne Hendricks

,

Ramona Comanescu

,

,

,

Juan Mateos-Garcia

,

A. Stevie Bergman

,

,

,

,

,

,

,

Laura Weidinger

Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA, 2024

All Too Human? Mapping and Mitigating the Risk from Anthropomorphic AI.

[DOI]

,

Laura Weidinger

,

Arianna Manzini

,

,

Proceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES-24) - Full Archival Papers, October 21-23, 2024, San Jose, California, USA, 2024

2023

Sociotechnical Safety Evaluation of Generative AI Systems.

[DOI]

Laura Weidinger

,

,

,

Arianna Manzini

,

Lisa Anne Hendricks

,

Juan Mateos-Garcia

,

A. Stevie Bergman

,

,

,

,

,

,

CoRR, 2023

2022

Improving alignment of dialogue agents via targeted human judgements.

[DOI]

CoRR, 2022

Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models.

[DOI]

,

,

Jonathan Uesato

,

,

,

Laura Weidinger

,

Sumanth Dathathri

,

,

Geoffrey Irving

,

,

,

Lisa Anne Hendricks

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Taxonomy of Risks posed by Language Models.

[DOI]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher.

[DOI]

,

Sebastian Borgeaud

,

,

,

Jordan Hoffmann

,

H. Francis Song

,

,

Sarah Henderson

,

,

,

Eliza Rutherford

,

,

,

,

,

George van den Driessche

,

Lisa Anne Hendricks

,

,

,

,

,

Sumanth Dathathri

,

,

Jonathan Uesato

,

,

,

Antonia Creswell

,

,

,

,

Siddhant M. Jayakumar

,

Elena Buchatskaya

,

,

Esme Sutherland

,

,

Michela Paganini

,

,

,

Xiang Lorraine Li

,

Adhiguna Kuncoro

,

Aida Nematzadeh

,

Elena Gribovskaya

,

,

Angeliki Lazaridou

,

,

Jean-Baptiste Lespiau

,

Maria Tsimpoukelli

,

Nikolai Grigorev

,

,

Thibault Sottiaux

,

Mantas Pajarskas

,

,

,

,

Cyprien de Masson d'Autume

,

,

,

Vladimir Mikulik

,

Igor Babuschkin

,

,

Diego de Las Casas

,

,

,

,

Matthew J. Johnson

,

Blake A. Hechtman

,

Laura Weidinger

,

,

,

Edward Lockhart

,

,

,

,

,

,

,

Lorrayne Bennett

,

,

Koray Kavukcuoglu

,

Geoffrey Irving

CoRR, 2021

Ethical and social risks of harm from Language Models.

[DOI]

CoRR, 2021

Alignment of Language Agents.

[DOI]

,

,

Laura Weidinger

,

,

Vladimir Mikulik

,

Geoffrey Irving

CoRR, 2021

Modelling Cooperation in Network Games with Spatio-Temporal Complexity.

[DOI]

Michiel A. Bakker

,

Richard Everett

,

Laura Weidinger

,

,

William S. Isaac

,

,

Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

2020

Model-free conventions in multi-agent reinforcement learning with heterogeneous preferences.

[DOI]

Raphael Köster

,

,

Richard Everett

,

Laura Weidinger

,

William S. Isaac

,

,

Edgar A. Duéñez-Guzmán

,

,

Matthew M. Botvinick

,

CoRR, 2020