2024
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
CoRR, 2024

The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation.
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024

2023
Speech Wikimedia: A 77 Language Multilingual Speech Dataset.
CoRR, 2023

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.
CoRR, 2023

DataPerf: Benchmarks for Data-Centric AI Development.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
DataPerf: Benchmarks for Data-Centric AI Development.
CoRR, 2022

2021
LSH methods for data deduplication in a Wikipedia artificial dataset.
CoRR, 2021

The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage.
CoRR, 2021

Multilingual Spoken Words Corpus.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021