Ani Nenkova

Orcid: 0000-0002-5825-7875

  • University of Pennsylvania, Philadelphia, PA, USA

According to our database1, Ani Nenkova authored at least 133 papers between 1999 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores.
CoRR, 2024

How Much Annotation is Needed to Compare Summarization Models?
CoRR, 2024

ATLAS: A System for PDF-centric Human Interaction Data Collection.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024

Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

ADOPD: A Large-Scale Document Page Decomposition Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PDFTriage: Question Answering over Long, Structured Documents.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer in Prompt Tuning.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
CoRR, 2023

AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models.
CoRR, 2023

PDFTriage: Question Answering over Long, Structured Documents.
CoRR, 2023

Summarization from Leaderboards to Practice: Choosing A Representation Backbone and Ensuring Robustness.
CoRR, 2023

Few-Shot Dialogue Summarization via Skeleton-Assisted Prompt Transfer.
CoRR, 2023

Web Table Formatting Affects Readability on Mobile Devices.
Proceedings of the ACM Web Conference 2023, 2023

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization.
Proceedings of the 16th International Natural Language Generation Conference, 2023

Learning the Visualness of Text Using Large Vision-Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Critical Analysis of Document Out-of-Distribution Detection.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Named Entity Recognition in a Very Homogenous Domain.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Factual or Contextual? Disentangling Error Types in Entity Description Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Temporal Effects on Pre-trained Models for Language Processing Tasks.
Trans. Assoc. Comput. Linguistics, 2022

Unified Pretraining Framework for Document Understanding.
CoRR, 2022

DocTime: A Document-level Temporal Dependency Graph Parser.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DI-2022: The Third Document Intelligence Workshop.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-Repetition in Abstractive Neural Summarizers.
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, 2022

Context-aware Information-theoretic Causal De-biasing for Interactive Sequence Labeling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Influence Functions for Sequence Tagging Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve.
Comput. Linguistics, 2021

UniDoc: Unified Pretraining Framework for Document Understanding.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

From Toxicity in Online Comments to Incivility in American News: Proceed with Caution.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

FRED: Fall Risk Evaluation Database Based on Electronic Health Record Data.
Proceedings of the IEEE/ACM Conference on Connected Health: Applications, 2021

The Utility and Interplay of Gazetteers and Entity Segmentation for Named Entity Recognition in English.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Trialstreamer: A living, automatically updated database of clinical trial reports.
J. Am. Medical Informatics Assoc., 2020

Understanding Clinical Trial Reports: Extracting Medical Entities and Their Relations.
CoRR, 2020

Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models.
CoRR, 2020

Trialstreamer: Mapping and Browsing Medical Evidence in Real-Time.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

Word Embeddings (Also) Encode Human Personality Stereotypes.
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics, 2019

Predicting Annotation Difficulty to Improve Task Routing and Model Performance for Biomedical Information Extraction.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Emotion Impacts Speech Recognition Performance.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

The Feasibility of Embedding Based Automatic Evaluation for Single Document Summarization.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Named Person Coreference in English News.
CoRR, 2018

Syntactic Patterns Improve Information Extraction for Medical Search.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Evaluating Multiple System Summary Lengths: A Case Study.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Combining Lexical and Syntactic Features for Detecting Content-Dense Texts in News.
J. Artif. Intell. Res., 2017

Detecting (Un)Important Content for Single-Document News Summarization.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

Aggregating and Predicting Sequence Labels from Crowd Annotations.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

The Instantiation Discourse Relation: A Corpus Analysis of Its Properties and Improved Detection.
Proceedings of the NAACL HLT 2016, 2016

Improving the Annotation of Sentence Specificity.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

An Environment for Transforming Game Character Animations Based on Nationality and Profession Personality Stereotypes.
Proceedings of the Twelfth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 2016

Phrase Generalization: a Corpus Study in Multi-Document Abstracts and Original News Alignments.
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016, 2016

Temporal Bayesian Fusion for Affect Sensing: Combining Video, Audio, and Lexical Modalities.
IEEE Trans. Cybern., 2015

Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech.
Comput. Speech Lang., 2015

Acoustic and lexical representations for affect prediction in spontaneous conversations.
Comput. Speech Lang., 2015

Inducing Lexical Style Properties for Paraphrase and Genre Differentiation.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Identification and Characterization of Newsworthy Verbs in World News.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

Detecting Content-Heavy Sentences: A Cross-Language Case Study.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

System Combination for Multi-document Summarization.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Fast and Accurate Prediction of Sentence Specificity.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

CREMA-D: Crowd-Sourced Emotional Multimodal Actors Dataset.
IEEE Trans. Affect. Comput., 2014

Reducing Sparsity Improves the Recognition of Implicit Discourse Relations.
Proceedings of the SIGDIAL 2014 Conference, 2014

Addressing Class Imbalance for Improved Recognition of Implicit Discourse Relations.
Proceedings of the SIGDIAL 2014 Conference, 2014

A Repository of State of the Art and Competitive Baseline Summaries for Generic News Summarization.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Verbose, Laconic or Just Right: A Simple Computational Model of Content Appropriateness under Length Constraints.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Improving the Estimation of Word Importance for News Multi-Document Summarization.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Cross-lingual Discourse Relation Analysis: A corpus study and a semi-supervised classification system.
Proceedings of the COLING 2014, 2014

Assessing the Discourse Factors that Influence the Quality of Machine Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Detecting Information-Dense Texts in Multiple News Domains.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain.
Trans. Assoc. Comput. Linguistics, 2013

A corpus of science journalism for analyzing writing quality.
Dialogue Discourse, 2013

Automatically Assessing Machine Summary Content Without a Gold Standard.
Comput. Linguistics, 2013

Automatic human utility evaluation of ASR systems: does WER really predict performance?
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Action Unit Models of Facial Expression of Emotion in the Presence of Speech.
Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Animating synthetic dyadic conversations with variations based on context and agent attributes.
Comput. Animat. Virtual Worlds, 2012

An Assessment of the Accuracy of Automatic Evaluation in Summarization.
Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization@NACCL-HLT 2012, 2012

Acoustic-Prosodic Entrainment and Social Behavior.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

A corpus of general and specific sentences from news.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Combining Ranking and Classification to Improve Emotion Recognition in Spontaneous Speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering.
Proceedings of the International Conference on Multimodal Interaction, 2012

A Coherence Model Based on Syntactic Patterns.
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012

Lexical Differences in Autobiographical Narratives from Schizophrenic Patients and Healthy Controls.
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012

A Survey of Text Summarization Techniques.
Proceedings of the Mining Text Data, 2012

Automatic Summarization.
Found. Trends Inf. Retr., 2011

Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries.
Comput. Linguistics, 2011

Acoustic and Prosodic Correlates of Social Behavior.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Automatic identification of general and specific sentences by leveraging discourse annotations.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Automatic Summarization.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

Text Specificity and Impact on Quality of News Summaries.
Proceedings of the Workshop on Monolingual Text-To-Text Generation@ACL, 2011

Class-level spectral features for emotion recognition.
Speech Commun., 2010

Using entity features to classify implicit discourse relations.
Proceedings of the SIGDIAL 2010 Conference, 2010

Discourse indicators for content selection in summarization.
Proceedings of the SIGDIAL 2010 Conference, 2010

Creating Local Coherence: An Empirical Assessment.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Structural Features for Predicting the Linguistic Quality of Text - Applications to Machine Translation, Automatic Summarization and Human-Authored Text.
Proceedings of the Empirical Methods in Natural Language Generation: Data-oriented Methods and Empirical Evaluation, 2010

Automatic Evaluation of Linguistic Quality in Multi-Document Summarization.
Proceedings of the ACL 2010, 2010

Predicting Summary Quality using Limited Human Input.
Proceedings of the Second Text Analysis Conference, 2009

Improving emotion recognition using class-level spectral features.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Automatically Evaluating Content Selection in Summarization without Human Models.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

Performance Confidence Estimation for Automatic Summarization.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Predicting the Fluency of Text with Shallow Structural Features: Case Studies of Machine Translation and Human-Written Text.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Using Syntax to Disambiguate Explicit Discourse Connectives in Text.
Proceedings of the ACL 2009, 2009

Automatic sense prediction for implicit discourse relations in text.
Proceedings of the ACL 2009, 2009

Automatic Summary Evaluation without Human Models.
Proceedings of the First Text Analysis Conference, 2008

Entity-driven Rewrite for Multi-document Summarization.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Revisiting Readability: A Unified Framework for Predicting Text Quality.
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008

Easily Identifiable Discourse Relations.
Proceedings of the COLING 2008, 2008

Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization.
Proceedings of the ACL 2008, 2008

High Frequency Word Entrainment in Spoken Dialogue.
Proceedings of the ACL 2008, 2008

The Pyramid Method: Incorporating human content selection variation in summarization evaluation.
ACM Trans. Speech Lang. Process., 2007

Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion.
Inf. Process. Manag., 2007

To Memorize or to Predict: Prominence labeling in Conversational Speech.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Modelling prominence and emphasis improves unit-selection synthesis.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Automatic detection of contrastive elements in spontaneous speech.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Measuring Importance and Query Relevance in Topic-focused Multi-document Summarization.
Proceedings of the ACL 2007, 2007

The (Non)Utility of Linguistic Features for Predicting prominence in spontaneous speech.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization.
Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

Summarization evaluation for text and speech: issues and approaches.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Do summaries help?
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Automatically Learning Cognitive Status for Multi-Document Summarization of Newswire.
Proceedings of the HLT/EMNLP 2005, 2005

Discourse Factors in Multi-Document Summarization.
Proceedings of the Proceedings, 2005

Automatic Text Summarization of Newswire: Lessons Learned from the Document Understanding Conference.
Proceedings of the Proceedings, 2005

Evaluating Content Selection in Summarization: The Pyramid Method.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2004

Syntactic Simplification for Improving Content Selection in Multi-Document Summarization.
Proceedings of the COLING 2004, 2004

Email Classification for Contact Centers.
Proceedings of the 2003 ACM Symposium on Applied Computing (SAC), 2003

Facilitating email thread access by extractive summary generation.
Proceedings of the Recent Advances in Natural Language Processing III, 2003

References to Named Entities: a Corpus Study.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

Columbia's Newsblaster: New Features and Future Directions.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

A Tableau Method for Graded Intersections of Modalities: A Case for Concept Languages.
J. Log. Lang. Inf., 2002

Integration of Resources and Components in a Knowledge-Based Web-Environment for Terminology Learning.
Proceedings of the Artificial Intelligence: Methodology, 2000

User Modelling as an Application of Actors.
Proceedings of the Conceptual Structures: Standards and Practices, 1999
