Bertie Vidgen

According to our database1, Bertie Vidgen authored at least 42 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The benefits, risks and bounds of personalizing the alignment of large language models to individuals.
Nat. Mac. Intell., 2024

The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources.
CoRR, 2024

Risks and Opportunities of Open-Source Generative AI.
CoRR, 2024

WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.
CoRR, 2024

Near to Mid-term Risks and Opportunities of Open Source Generative AI.
CoRR, 2024

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models.
CoRR, 2024

Introducing v0.5 of the AI Safety Benchmark from MLCommons.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2024

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety.
CoRR, 2024

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024



2023
FinanceBench: A New Benchmark for Financial Question Answering.
CoRR, 2023

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models.
CoRR, 2023

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models.
CoRR, 2023

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback.
CoRR, 2023

SemEval-2023 Task 10: Explainable Detection of Online Sexism.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Improving the Detection of Multilingual Online Attacks with Rich Social Media Data from Singapore.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Editorial for Special Issue on Detecting, Understanding and Countering Online Harms.
Online Soc. Networks Media, 2022

Tackling racial bias in automated online hate detection: Towards fair and accurate detection of hateful users with geometric deep learning.
EPJ Data Sci., 2022

How can we combat online misinformation? A systematic overview of current interventions and their efficacy.
CoRR, 2022

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning.
CoRR, 2022

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models.
CoRR, 2022

Handling and Presenting Harmful Text.
CoRR, 2022

Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Hatemoji: A Test Suite and Adversarially-Generated Dataset for Benchmarking and Detecting Emoji-Based Hate.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Handling and Presenting Harmful Text in NLP Research.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
An influencer-based approach to understanding radical right viral tweets.
CoRR, 2021

Tackling Racial Bias in Automated Online Hate Detection: Towards Fair and Accurate Classification of Hateful Online Users Using Geometric Deep Learning.
CoRR, 2021

Introducing CAD: the Contextual Abuse Dataset.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Dynabench: Rethinking Benchmarking in NLP.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

An Expert Annotated Dataset for the Detection of Online Misogyny.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

HateCheck: Functional Tests for Hate Speech Detection Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020
Directions in Abusive Language Training Data: Garbage In, Garbage Out.
CoRR, 2020

Detecting East Asian Prejudice on Social Media.
Proceedings of the Fourth Workshop on Online Abuse and Harms, 2020

Online Abuse and Human Rights: WOAH Satellite Session at RightsCon 2020.
Proceedings of the Fourth Workshop on Online Abuse and Harms, 2020

2019
Trajectories of Islamophobic hate amongst far right actors on Twitter.
CoRR, 2019

What, When and Where of petitions submitted to the UK Government during a time of chaos.
CoRR, 2019

2018
Detecting weak and strong Islamophobic hate speech on social media.
CoRR, 2018

2016
P-values: misunderstood and misused.
CoRR, 2016


  Loading...