Hannah Kirk
Orcid: 0000-0002-7419-5993
According to our database1,
Hannah Kirk
authored at least 35 papers
between 2021 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Correction to: The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub.
J. Comput. Soc. Sci., October, 2024
The AI community building the future? A quantitative analysis of development activity on Hugging Face Hub.
J. Comput. Soc. Sci., October, 2024
The benefits, risks and bounds of personalizing the alignment of large language models to individuals.
Nat. Mac. Intell., 2024
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages.
CoRR, 2024
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
CoRR, 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 2024 International Conference on Information Technology for Social Good, 2024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation.
Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023
Casteist but Not Racist? Quantifying Disparities in Large Language Model Bias between India and the West.
CoRR, 2023
DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures.
CoRR, 2023
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets.
CoRR, 2023
Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.
CoRR, 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
2022
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning.
CoRR, 2022
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
CoRR, 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
A Prompt Array Keeps the Bias Away: Debiasing Vision-Language Models with Adversarial Learning.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
2021
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset.
CoRR, 2021
CoRR, 2021
Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021