Sara Hooker

According to our database1, Sara Hooker authored at least 71 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A large-scale audit of dataset licensing and attribution in AI.
Nat. Mac. Intell., 2024

Bridging the Data Provenance Gap Across Text, Speech and Video.
CoRR, 2024

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier.
CoRR, 2024

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation.
CoRR, 2024

The Reality of AI and Biorisk.
CoRR, 2024

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge.
CoRR, 2024

M-RewardBench: Evaluating Reward Models in Multilingual Settings.
CoRR, 2024

Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning.
CoRR, 2024

The Future of Open Human Feedback.
CoRR, 2024

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts.
CoRR, 2024

Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress.
CoRR, 2024

To Code, or Not To Code? Exploring Impact of Code in Pre-training.
CoRR, 2024

Open Problems in Technical AI Governance.
CoRR, 2024

Consent in Crisis: The Rapid Decline of the AI Data Commons.
CoRR, 2024

On the Limitations of Compute Thresholds as a Governance Strategy.
CoRR, 2024

LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives.
CoRR, 2024

IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models.
CoRR, 2024

Aya 23: Open Weight Releases to Further Multilingual Progress.
CoRR, 2024

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning.
CoRR, 2024

On The Fairness Impacts of Hardware Selection in Machine Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLM See, LLM Do: Leveraging Active Inheritance to Target Non-Differentiable Objectives.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

How Does Quantization Affect Multilingual LLMs?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024


From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Back to Basics: Revisiting REINFORCE-Style Optimization for Learning from Human Feedback in LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Efficient Methods for Natural Language Processing: A Survey.
Trans. Assoc. Comput. Linguistics, 2023

Generalisable Agents for Neural Network Optimisation.
CoRR, 2023

Elo Uncovered: Robustness and Best Practices in Language Model Evaluation.
CoRR, 2023

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI.
CoRR, 2023

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation.
CoRR, 2023

When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale.
CoRR, 2023

Frontier AI Regulation: Managing Emerging Risks to Public Safety.
CoRR, 2023

Evaluating the Social Impact of Generative AI Systems in Systems and Society.
CoRR, 2023

Intriguing Properties of Quantization at Scale.
CoRR, 2023

FAIR-Ensemble: When Fairness Naturally Emerges From Deep Ensembling.
CoRR, 2023

Robust distillation for worst-class performance: on the interplay between teacher and student objectives.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Grand Illusion: The Myth of Software Portability and Implications for ML Progress.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Intriguing Properties of Quantization at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Locally Differentially Private Document Generation Using Zero Shot Prompting.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Large language models are not zero-shot communicators.
CoRR, 2022

Efficient Methods for Natural Language Processing: A Survey.
CoRR, 2022

Studying the impact of magnitude pruning on contrastive learning methods.
CoRR, 2022

Robust Distillation for Worst-class Performance.
CoRR, 2022

When less is more: Simplifying inputs aids neural network understanding.
CoRR, 2022

Randomness in Neural Network Training: Characterizing the Impact of Tooling.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Intriguing Properties of Compression on Multilingual Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Estimating Example Difficulty using Variance of Gradients.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Moving beyond "algorithmic bias is a data problem".
Patterns, 2021

A Tale Of Two Long Tails.
CoRR, 2021

When does loss-based prioritization fail?
CoRR, 2021

Keep the Gradients Flowing: Using Gradient Flow to Study Sparse Network Optimization.
CoRR, 2021

The hardware lottery.
Commun. ACM, 2021

The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

2020
Characterising Bias in Compressed Models.
CoRR, 2020

Estimating Example Difficulty using Variance of Gradients.
CoRR, 2020

Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims.
CoRR, 2020

2019
The (Un)reliability of Saliency Methods.
Proceedings of the Explainable AI: Interpreting, 2019

Selective Brain Damage: Measuring the Disparate Impact of Model Pruning.
CoRR, 2019

The State of Sparsity in Deep Neural Networks.
CoRR, 2019

A Benchmark for Interpretability Methods in Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Evaluating Feature Importance Estimates.
CoRR, 2018

2017
The (Un)reliability of saliency methods.
CoRR, 2017


  Loading...