Hannah Kirk

Orcid: 0000-0002-7419-5993

According to our database1, Hannah Kirk authored at least 35 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Auditing large language models: a three-layered approach.
AI Ethics, November, 2024

Correction to: The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub.
J. Comput. Soc. Sci., October, 2024

The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub.
J. Comput. Soc. Sci., October, 2024

The benefits, risks and bounds of personalizing the alignment of large language models to individuals.
Nat. Mac. Intell., 2024

The Future of Open Human Feedback.
CoRR, 2024

Modulating Language Model Experiences through Frictions.
CoRR, 2024

LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages.
CoRR, 2024

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Indian-BhED: A Dataset for Measuring India-Centric Biases in Large Language Models.
Proceedings of the 2024 International Conference on Information Technology for Social Good, 2024

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation.
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023

Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West.
CoRR, 2023

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures.
CoRR, 2023

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets.
CoRR, 2023

Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.
CoRR, 2023

Assessing Language Model Deployment with Risk Cards.
CoRR, 2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023


VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning.
CoRR, 2022

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements.
CoRR, 2022

Handling and Presenting Harmful Text.
CoRR, 2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
CoRR, 2022

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

Handling and Presenting Harmful Text in NLP Research.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset.
CoRR, 2021

How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases.
CoRR, 2021

Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021


  Loading...