Maarten Sap

Orcid: 0000-0002-0701-4654

Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA


According to our database1, Maarten Sap authored at least 90 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation.
CoRR, 2024

BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data.
CoRR, 2024

Data Defenses Against Large Language Models.
CoRR, 2024

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions.
CoRR, 2024

AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents.
CoRR, 2024

User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions.
CoRR, 2024

On the Resilience of Multi-Agent Systems with Malicious Agents.
CoRR, 2024

Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance.
CoRR, 2024

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models.
CoRR, 2024

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs.
CoRR, 2024

PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models.
CoRR, 2024

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs.
CoRR, 2024

NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models.
CoRR, 2024

Particip-AI: A Democratic Surveying Framework for Anticipating Future AI Use Cases, Harms and Benefits.
CoRR, 2024

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

The Empirical Variability of Narrative Perceptions of Social Media Texts.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Non-Literal Intent Resolution in LLMs.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

SOTOPIA-π: Interactive Learning of Socially Intelligent Language Agents.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Where Do People Tell Stories Online? Story Detection Across Online Communities.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Don't Take This Out of Context! On the Need for Contextual Models and Evaluations for Stylistic Rewriting.
CoRR, 2023

Improving Language Models with Advantage-based Offline Policy Gradients.
CoRR, 2023

Modeling Empathic Similarity in Personal Narratives.
CoRR, 2023

Queer In AI: A Case Study in Community-Led Participatory AI.
CoRR, 2023

Towards Countering Essentialism through Social Bias Reasoning.
CoRR, 2023


BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Don't Take This Out of Context!: On the Need for Contextual Models and Evaluations for Stylistic Rewriting.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Modeling Empathic Similarity in Personal Narratives.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

NLPositionality: Characterizing Design Biases of Datasets and Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Riveter: Measuring Power and Social Dynamics Between Entities.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
CoRR, 2022

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment.
CoRR, 2022

Computational Lens on Cognition: Study Of Autobiographical Versus Imagined Stories With Large-Scale Language Models.
CoRR, 2022

When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Aligning to Social Norms and Values in Interactive Narratives.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Misinfo Reaction Frames: Reasoning about Readers' Reactions to News Headlines.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Positive AI with Social Commonsense Models.
PhD thesis, 2021

Delphi: Towards Machine Ethics and Norms.
CoRR, 2021

On-the-Fly Controlled Text Generation with Experts and Anti-Experts.
CoRR, 2021

Misinfo Belief Frames: A Case Study on Covid & Climate News.
CoRR, 2021

Documenting the English Colossal Clean Crawled Corpus.
CoRR, 2021

Detoxifying Language Models Risks Marginalizing Minority Voices.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Challenges in Automated Debiasing for Toxic Language Detection.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
PowerTransformer: Unsupervised Controllable Revision for Biased Language Correction.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Social Chemistry 101: Learning to Reason about Social and Moral Norms.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Commonsense Reasoning for Natural Language Processing.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2020

Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Social Bias Frames: Reasoning about Social and Power Implications of Language.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Exploring the Effect of Author and Reader Identity in Online Story Writing: the STORIESINTHEWILD Corpus.
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, 2020

2019
SocialIQA: Commonsense Reasoning about Social Interactions.
CoRR, 2019

Social IQa: Commonsense Reasoning about Social Interactions.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

The Risk of Racial Bias in Hate Speech Detection.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

COMET: Commonsense Transformers for Automatic Knowledge Graph Construction.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Sounding Board: A User-Centric and Content-Driven Social Chatbot.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

Event2Mind: Commonsense Inference on Events, Intents, and Reactions.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Modeling Naive Psychology of Characters in Simple Commonsense Stories.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
DLATK: Differential Language Analysis ToolKit.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Connotation Frames of Power and Agency in Modern Films.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Story Cloze Task: UW NLP System.
Proceedings of the 2nd Workshop on Linking Models of Lexical, 2017

The Effect of Different Writing Tasks on Linguistic Style: A Case Study of the ROC Story Cloze Task.
Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), 2017

2016
Predicting Individual Well-Being Through the Language of Social Media.
Proceedings of the Biocomputing 2016: Proceedings of the Pacific Symposium, 2016

2015
Extracting Human Temporal Orientation from Facebook Language.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task.
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2015

The role of personality, age, and gender in tweeting about mental illness.
Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2015

2014
Developing Age and Gender Predictive Lexica over Social Media.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Towards Assessing Changes in Degree of Depression through Facebook.
Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2014


  Loading...