Bertie Vidgen
According to our database1,
Bertie Vidgen
authored at least 42 papers
between 2016 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
The benefits, risks and bounds of personalizing the alignment of large language models to individuals.
Nat. Mac. Intell., 2024
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources.
CoRR, 2024
CoRR, 2024
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
CoRR, 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety.
CoRR, 2024
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
Online Soc. Networks Media, 2022
Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning.
EPJ Data Sci., 2022
How can we combat online misinformation? A systematic overview of current interventions and their efficacy.
CoRR, 2022
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning.
CoRR, 2022
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models.
CoRR, 2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
2021
Tackling Racial Bias in Automated Online Hate Detection: Towards Fair and Accurate Classification of Hateful Online Users Using Geometric Deep Learning.
CoRR, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
2020
Proceedings of the Fourth Workshop on Online Abuse and Harms, 2020
Proceedings of the Fourth Workshop on Online Abuse and Harms, 2020
2019
What, When and Where of petitions submitted to the UK Government during a time of chaos.
CoRR, 2019
2018
2016