Pengfei Liu

CoRR, January, 2025

2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.

[BibT_eX]

[DOI]

CoRR, 2024

PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World.

[BibT_eX]

[DOI]

CoRR, 2024

O1 Replication Journey - Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

[BibT_eX]

[DOI]

CoRR, 2024

O1 Replication Journey: A Strategic Progress Report - Part 1.

[BibT_eX]

[DOI]

CoRR, 2024

OpenResearcher: Unleashing AI for Accelerated Scientific Research.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding Reference Policies in Direct Preference Optimization.

[BibT_eX]

[DOI]

Yixin Liu

Arman Cohan

CoRR, 2024

Halu-J: Critique-Based Hallucination Judge.

[BibT_eX]

[DOI]

CoRR, 2024

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Progress or Regress? Self-Improvement Reversal in Post-training.

[BibT_eX]

[DOI]

Ting Wu

Xuefeng Li

CoRR, 2024

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models.

[BibT_eX]

[DOI]

Yiyuan Li

Shichao Sun

CoRR, 2024

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?

[BibT_eX]

[DOI]

CoRR, 2024

BeHonest: Benchmarking Honesty of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Benchmarking Benchmark Leakage in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Evaluating Mathematical Reasoning Beyond Accuracy.

[BibT_eX]

[DOI]

CoRR, 2024

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate.

[BibT_eX]

[DOI]

CoRR, 2024

Extending LLMs' Context Window with 100 Samples.

[BibT_eX]

[DOI]

Yikai Zhang

Junlong Li

CoRR, 2024

The Critique of Critique.

[BibT_eX]

[DOI]

CoRR, 2024

MathPile: A Billion-Token-Scale Pretraining Corpus for Math.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Alignment for Honesty.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

On Learning to Summarize with Large Language Models as References.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

GPTScore: Evaluate as You Desire.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Generative Judge for Evaluating Alignment.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ECON: On the Detection and Resolution of Evidence Conflicts.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Reformatted Alignment.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Weak-to-Strong Reasoning.

[BibT_eX]

[DOI]

Yuqing Yang

Yan Ma

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLMCrit: Teaching Large Language Models to Use Criteria.

[BibT_eX]

[DOI]

Matthias Gallé

Proceedings of the Findings of the Association for Computational Linguistics, 2024

Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

InFoBench: Evaluating Instruction Following Ability in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation.

[BibT_eX]

[DOI]

Yan Ma

Yu Qiao

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Dissecting Human and LLM Preferences.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2023

Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math.

[BibT_eX]

[DOI]

Zengzhi Wang

Rui Xia

CoRR, 2023

Align on the Fly: Adapting Chatbot Behavior to Established Norms.

[BibT_eX]

[DOI]

CoRR, 2023

FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning.

[BibT_eX]

[DOI]

CoRR, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.

[BibT_eX]

[DOI]

Antonios Anastasopoulos

CoRR, 2023

On Learning to Summarize with Large Language Models as References.

[BibT_eX]

[DOI]

CoRR, 2023

FELM: Benchmarking Factuality Evaluation of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PAL: Program-aided Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.

[BibT_eX]

[DOI]

Antonios Anastasopoulos

Swarnashree Mysore Sathyendra

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Multi-Dimensional Evaluation of Text Summarization with In-Context Learning.

[BibT_eX]

[DOI]

Sameer Jain

Vaishakh Keshava

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Can We Automate Scientific Reviewing?

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2022

Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization.

[BibT_eX]

[DOI]

Yiwei Qin

CoRR, 2022

Towards a Unified Multi-Dimensional Evaluator for Text Generation.

[BibT_eX]

[DOI]

CoRR, 2022

reStructured Pre-training.

[BibT_eX]

[DOI]

CoRR, 2022

Polyglot Prompt: Multilingual Multitask PrompTraining.

[BibT_eX]

[DOI]

See-Kiong Ng

CoRR, 2022

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Towards a Unified Multi-Dimensional Evaluator for Text Generation.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models.

[BibT_eX]

[DOI]

Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Polyglot Prompt: Multilingual Multitask Prompt Training.

[BibT_eX]

[DOI]

See-Kiong Ng

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

DataLab: A Platform for Data Analysis and Intervention.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

BRIO: Bringing Order to Abstractive Summarization.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

KID-Review: Knowledge-Guided Scientific Review Generation with Oracle Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Hierarchical Modeling for Out-of-Scope Domain and Intent Classification.

[BibT_eX]

[DOI]

Kun Li

Helen Meng

CoRR, 2021

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing.

[BibT_eX]

[DOI]

CoRR, 2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation.

[BibT_eX]

[DOI]

CoRR, 2021

BARTScore: Evaluating Generated Text as Text Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

RefSum: Refactoring Neural Summarization.

[BibT_eX]

[DOI]

Yixin Liu

Zi-Yi Dou

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Larger-Context Tagging: When and Why Does It Work?

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

GSum: A General Framework for Guided Neural Abstractive Summarization.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling.

[BibT_eX]

[DOI]

Kun Li

Helen Meng

Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization.

[BibT_eX]

[DOI]

Yiran Chen

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Towards More Fine-grained and Reliable NLP Performance Prediction.

[BibT_eX]

[DOI]

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

CitationIE: Leveraging the Citation Graph for Scientific Information Extraction.

[BibT_eX]

[DOI]

Vijay Viswanathan

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

ExplainaBoard: An Explainable Leaderboard for NLP.

[BibT_eX]

[DOI]

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization.

[BibT_eX]

[DOI]

Yixin Liu

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SpanNER: Named Entity Re-/Recognition as Span Prediction.

[BibT_eX]

[DOI]

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems.

[BibT_eX]

[DOI]

CoRR, 2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study.

[BibT_eX]

[DOI]

CoRR, 2020

Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.

[BibT_eX]

[DOI]

Kun Li

Helen Meng

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

RethinkCWS: Is Chinese Word Segmentation a Solved Task?

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Interpretable Multi-dataset Evaluation for Named Entity Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Re-evaluating Evaluation in Text Summarization.

[BibT_eX]

[DOI]

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Extractive Summarization as Text Matching.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Heterogeneous Graph Neural Networks for Extractive Document Summarization.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Learning Sparse Sharing Architectures for Multiple Tasks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-Scale Self-Attention for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study.

[BibT_eX]

[DOI]

Qi Zhang

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

A Closer Look at Data Bias in Neural Extractive Summarization Models.

[BibT_eX]

[DOI]

CoRR, 2019

Exploring Domain Shift in Extractive Text Summarization.

[BibT_eX]

[DOI]

CoRR, 2019

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Star-Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Searching for Effective Neural Extractive Summarization: What Works and What's Next.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

TIGS: An Inference Algorithm for Text Infilling with Gradient Search.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Learning Multi-Task Communication with Message Passing for Sequence Learning.

[BibT_eX]

[DOI]

Jackie Chi Kit Cheung

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Contextualized Non-Local Neural Networks for Sequence Learning.

[BibT_eX]

[DOI]

Jackie Chi Kit Cheung

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Multi-task Learning over Graph Structures.

[BibT_eX]

[DOI]

Jackie Chi Kit Cheung

CoRR, 2018

Meta-Learning Multi-task Communication.

[BibT_eX]

[DOI]

CoRR, 2018

Meta Multi-Task Learning for Sequence Modeling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

A model of extended paragraph vector for document categorization and trend analysis.

[BibT_eX]

[DOI]

King Keung Wu

Helen M. Meng

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Adaptive Semantic Compositionality for Sentence Modelling.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Dynamic Compositional Neural Networks over Tree Structure.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Idiom-Aware Compositional Distributed Semantics.

[BibT_eX]

[DOI]

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Adversarial Multi-task Learning for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016

Deep Multi-Task Learning with Shared Memory.

[BibT_eX]

[DOI]

CoRR, 2016

Syntax-based Attention Model for Natural Language Inference.

[BibT_eX]

[DOI]

CoRR, 2016

Modelling Interaction of Sentence Pair with coupled-LSTMs.

[BibT_eX]

[DOI]

CoRR, 2016

An embedding approach for context-aware collaborative recommendation and visualization.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

Recurrent Neural Network for Text Classification with Multi-Task Learning.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Learning Track Representation and Trends for Conference Analytics.

[BibT_eX]

[DOI]

Proceedings of the 49th Hawaii International Conference on System Sciences, 2016

Modelling Interaction of Sentence Pair with Coupled-LSTMs.

[BibT_eX]

[DOI]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Deep Multi-Task Learning with Shared Memory for Text Classification.

[BibT_eX]

[DOI]

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Deep Fusion LSTMs for Text Semantic Matching.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network.

[BibT_eX]

[DOI]

Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Discourse Relations Detection via a Mixed Generative-Discriminative Framework.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015

Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Topic modeling for conference analytics.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents.

[BibT_eX]

[DOI]

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings.

[BibT_eX]

[DOI]

Shafiq R. Joty

Helen M. Meng

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Long Short-Term Memory Neural Networks for Chinese Word Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

2014

SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis.

[BibT_eX]

[DOI]