Sara Hooker

According to our database¹, Sara Hooker authored at least 73 papers between 2017 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Fairness of Deep Ensembles: On the interplay between per-group task difficulty and under-representation.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

A large-scale audit of dataset licensing and attribution in AI.

[BibT_eX]

[DOI]

Nat. Mac. Intell., 2024

Bridging the Data Provenance Gap Across Text, Speech and Video.

[BibT_eX]

[DOI]

CoRR, 2024

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier.

[BibT_eX]

[DOI]

CoRR, 2024

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

The Reality of AI and Biorisk.

[BibT_eX]

[DOI]

CoRR, 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge.

[BibT_eX]

[DOI]

Azril Hafizi Amirudin

Daniel Fernando Erazo Florez

Fabian Farestam

Joseph Marvin Imperial

Shayekh Bin Islam

Perttu Isotalo

Maral Jabbarishiviari

Gabriel Adriano de Melo

Johan Samir Obando Ceron

Marjana Prifti Skenduli

Arshia Soltani Moakhar

Bardia Soltani Moakhar

Ran Tamir

Ayush Kumar Tarun

Azmine Toushik Wasi

Thenuka Ovin Weerasinghe

CoRR, 2024

M-RewardBench: Evaluating Reward Models in Multilingual Settings.

[BibT_eX]

[DOI]

Srishti Gureja

Lester James V. Miranda

CoRR, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning.

[BibT_eX]

[DOI]

Aakanksha

Arash Ahmadian

Seraphina Goldfarb-Tarrant

Beyza Ermis

Marzieh Fadaee

Sara Hooker

CoRR, 2024

The Future of Open Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2024

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress.

[BibT_eX]

[DOI]

CoRR, 2024

To Code, or Not To Code? Exploring Impact of Code in Pre-training.

[BibT_eX]

[DOI]

CoRR, 2024

Open Problems in Technical AI Governance.

[BibT_eX]

[DOI]

CoRR, 2024

Consent in Crisis: The Rapid Decline of the AI Data Commons.

[BibT_eX]

[DOI]

CoRR, 2024

On the Limitations of Compute Thresholds as a Governance Strategy.

[BibT_eX]

[DOI]

Sara Hooker

CoRR, 2024

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives.

[BibT_eX]

[DOI]

CoRR, 2024

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Aya 23: Open Weight Releases to Further Multilingual Progress.

[BibT_eX]

[DOI]

CoRR, 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Consent in Crisis: The Rapid Decline of the AI Data Commons.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On The Fairness Impacts of Hardware Selection in Machine Learning.

[BibT_eX]

[DOI]

Sree Harsha Nelaturu

Nishaanth Kanna Ravichandran

Cuong Tran

Sara Hooker

Ferdinando Fioretto

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

How Does Quantization Affect Multilingual LLMs?

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm.

[BibT_eX]

[DOI]

Aakanksha

Arash Ahmadian

Beyza Ermis

Seraphina Goldfarb-Tarrant

Julia Kreutzer

Marzieh Fadaee

Sara Hooker

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Efficient Methods for Natural Language Processing: A Survey.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Generalisable Agents for Neural Network Optimisation.

[BibT_eX]

[DOI]

Kale-ab Abebe Tessera

CoRR, 2023

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI.

[BibT_eX]

[DOI]

CoRR, 2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation.

[BibT_eX]

[DOI]

CoRR, 2023

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale.

[BibT_eX]

[DOI]

CoRR, 2023

Frontier AI Regulation: Managing Emerging Risks to Public Safety.

[BibT_eX]

[DOI]

CoRR, 2023

Evaluating the Social Impact of Generative AI Systems in Systems and Society.

[BibT_eX]

[DOI]

Alexandra Sasha Luccioni

CoRR, 2023

Intriguing Properties of Quantization at Scale.

[BibT_eX]

[DOI]

CoRR, 2023

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling.

[BibT_eX]

[DOI]

CoRR, 2023

Robust distillation for worst-class performance: on the interplay between teacher and student objectives.

[BibT_eX]

[DOI]

Serena Wang

Harikrishna Narasimhan

Proceedings of the Uncertainty in Artificial Intelligence, 2023

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Grand Illusion: The Myth of Software Portability and Implications for ML Progress.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Intriguing Properties of Quantization at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics.

[BibT_eX]

[DOI]

Shoaib Ahmed Siddiqui

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Locally Differentially Private Document Generation Using Zero Shot Prompting.

[BibT_eX]

[DOI]

Saiteja Utpala

Sara Hooker

Pin-Yu Chen

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Large language models are not zero-shot communicators.

[BibT_eX]

[DOI]

CoRR, 2022

Efficient Methods for Natural Language Processing: A Survey.

[BibT_eX]

[DOI]

Pedro Henrique Martins

Niranjan Balasubramanian

Leon Derczynski

Roy Schwartz

CoRR, 2022

Studying the impact of magnitude pruning on contrastive learning methods.

[BibT_eX]

[DOI]

CoRR, 2022

Robust Distillation for Worst-class Performance.

[BibT_eX]

[DOI]

Serena Lutong Wang

Harikrishna Narasimhan

CoRR, 2022

When less is more: Simplifying inputs aids neural network understanding.

[BibT_eX]

[DOI]

Robin Tibor Schirrmeister

Rosanne Liu

Sara Hooker

Tonio Ball

CoRR, 2022

Randomness in Neural Network Training: Characterizing the Impact of Tooling.

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Intriguing Properties of Compression on Multilingual Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Estimating Example Difficulty using Variance of Gradients.

[BibT_eX]

[DOI]

Chirag Agarwal

Daniel D'souza

Sara Hooker

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Moving beyond "algorithmic bias is a data problem".

[BibT_eX]

[DOI]

Sara Hooker

Patterns, 2021

A Tale Of Two Long Tails.

[BibT_eX]

[DOI]

CoRR, 2021

When does loss-based prioritization fail?

[BibT_eX]

[DOI]

CoRR, 2021

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization.

[BibT_eX]

[DOI]

Kale-ab Abebe Tessera

Sara Hooker

Benjamin Rosman

CoRR, 2021

The hardware lottery.

[BibT_eX]

[DOI]

Sara Hooker

Commun. ACM, 2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation.

[BibT_eX]

[DOI]

Orevaoghene Ahia

Julia Kreutzer

Sara Hooker

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020

Characterising Bias in Compressed Models.

[BibT_eX]

[DOI]

CoRR, 2020

Estimating Example Difficulty using Variance of Gradients.

[BibT_eX]

[DOI]

Chirag Agarwal

Sara Hooker

CoRR, 2020

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims.

[BibT_eX]

[DOI]

Thomas Krendl Gilbert

CoRR, 2020

2019

The (Un)reliability of Saliency Methods.

[BibT_eX]

[DOI]

Pieter-Jan Kindermans

Proceedings of the Explainable AI: Interpreting, 2019

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning.

[BibT_eX]

[DOI]

CoRR, 2019

The State of Sparsity in Deep Neural Networks.

[BibT_eX]

[DOI]

Trevor Gale

Erich Elsen

Sara Hooker

CoRR, 2019

A Benchmark for Interpretability Methods in Deep Neural Networks.

[BibT_eX]

[DOI]

Sara Hooker

Dumitru Erhan

Pieter-Jan Kindermans

Been Kim

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018

Evaluating Feature Importance Estimates.

[BibT_eX]

[DOI]

Sara Hooker

Dumitru Erhan

Pieter-Jan Kindermans

Been Kim

CoRR, 2018

2017

The (Un)reliability of saliency methods.

[BibT_eX]

[DOI]

Pieter-Jan Kindermans

CoRR, 2017

Sara Hooker

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...