Pengfei Liu

Orcid: 0009-0008-6932-7091

Affiliations:
  • Shanghai Jiao Tong University, Generative AI Research Lab (GAIR), China
  • Carnegie Mellon University, PA, USA (former)
  • Fudan University, Shanghai Key Laboratory of Intelligent Information Processing, China (former)


According to our database1, Pengfei Liu authored at least 126 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability.
CoRR, 2024

PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World.
CoRR, 2024

O1 Replication Journey - Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?
CoRR, 2024

O1 Replication Journey: A Strategic Progress Report - Part 1.
CoRR, 2024

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation.
CoRR, 2024

OpenResearcher: Unleashing AI for Accelerated Scientific Research.
CoRR, 2024

Understanding Reference Policies in Direct Preference Optimization.
CoRR, 2024

Halu-J: Critique-Based Hallucination Judge.
CoRR, 2024

MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models.
CoRR, 2024

ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation.
CoRR, 2024

Progress or Regress? Self-Improvement Reversal in Post-training.
CoRR, 2024

FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models.
CoRR, 2024

OlympicArena Medal Ranks: Who Is the Most Intelligent AI So Far?
CoRR, 2024

BeHonest: Benchmarking Honesty of Large Language Models.
CoRR, 2024

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI.
CoRR, 2024

Benchmarking Benchmark Leakage in Large Language Models.
CoRR, 2024

Evaluating Mathematical Reasoning Beyond Accuracy.
CoRR, 2024

Reformatted Alignment.
CoRR, 2024

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate.
CoRR, 2024

Extending LLMs' Context Window with 100 Samples.
CoRR, 2024

The Critique of Critique.
CoRR, 2024

On Learning to Summarize with Large Language Models as References.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

GPTScore: Evaluate as You Desire.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Generative Judge for Evaluating Alignment.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ECON: On the Detection and Resolution of Evidence Conflicts.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Weak-to-Strong Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLMCrit: Teaching Large Language Models to Use Criteria.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Prompt Chaining or Stepwise Prompt? Refinement in Text Summarization.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

InFoBench: Evaluating Instruction Following Ability in Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Dissecting Human and LLM Preferences.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing.
ACM Comput. Surv., 2023

Generative AI for Math: Part I - MathPile: A Billion-Token-Scale Pretraining Corpus for Math.
CoRR, 2023

Align on the Fly: Adapting Chatbot Behavior to Established Norms.
CoRR, 2023

Alignment for Honesty.
CoRR, 2023

FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios.
CoRR, 2023

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning.
CoRR, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.
CoRR, 2023

On Learning to Summarize with Large Language Models as References.
CoRR, 2023

FELM: Benchmarking Factuality Evaluation of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PAL: Program-aided Language Models.
Proceedings of the International Conference on Machine Learning, 2023

GlobalBench: A Benchmark for Global Progress in Natural Language Processing.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

T5Score: Discriminative Fine-tuning of Generative Evaluation Metrics.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Multi-Dimensional Evaluation of Text Summarization with In-Context Learning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DataFinder: Scientific Dataset Recommendation from Natural Language Descriptions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Can We Automate Scientific Reviewing?
J. Artif. Intell. Res., 2022

Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization.
CoRR, 2022

Towards a Unified Multi-Dimensional Evaluator for Text Generation.
CoRR, 2022

reStructured Pre-training.
CoRR, 2022

Polyglot Prompt: Multilingual Multitask PrompTraining.
CoRR, 2022

Are All the Datasets in Benchmark Necessary? A Pilot Study of Dataset Evaluation for Text Classification.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Towards a Unified Multi-Dimensional Evaluator for Text Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models.
Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Polyglot Prompt: Multilingual Multitask Prompt Training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

DataLab: A Platform for Data Analysis and Intervention.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022

BRIO: Bringing Order to Abstractive Summarization.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

KID-Review: Knowledge-Guided Scientific Review Generation with Oracle Pre-training.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Hierarchical Modeling for Out-of-Scope Domain and Intent Classification.
CoRR, 2021

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing.
CoRR, 2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation.
CoRR, 2021

BARTScore: Evaluating Generated Text as Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

RefSum: Refactoring Neural Summarization.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Larger-Context Tagging: When and Why Does It Work?
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

GSum: A General Framework for Guided Neural Abstractive Summarization.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Are Factuality Checkers Reliable? Adversarial Meta-evaluation of Factuality in Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Towards More Fine-grained and Reliable NLP Performance Prediction.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

CitationIE: Leveraging the Citation Graph for Scientific Information Extraction.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

ExplainaBoard: An Explainable Leaderboard for NLP.
Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SimCLS: A Simple Framework for Contrastive Learning of Abstractive Summarization.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

SpanNER: Named Entity Re-/Recognition as Span Prediction.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
CDEvalSumm: An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems.
CoRR, 2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study.
CoRR, 2020

Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

RethinkCWS: Is Chinese Word Segmentation a Solved Task?
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Interpretable Multi-dataset Evaluation for Named Entity Recognition.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

An Empirical Study of Cross-Dataset Evaluation for Neural Summarization Systems.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Re-evaluating Evaluation in Text Summarization.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Extractive Summarization as Text Matching.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Heterogeneous Graph Neural Networks for Extractive Document Summarization.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Learning Sparse Sharing Architectures for Multiple Tasks.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Multi-Scale Self-Attention for Text Classification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Rethinking Generalization of Neural Models: A Named Entity Recognition Case Study.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
A Closer Look at Data Bias in Neural Extractive Summarization Models.
CoRR, 2019

Exploring Domain Shift in Extractive Text Summarization.
CoRR, 2019

DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks.
CoRR, 2019

Star-Transformer.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Searching for Effective Neural Extractive Summarization: What Works and What's Next.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

TIGS: An Inference Algorithm for Text Infilling with Gradient Search.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Learning Multi-Task Communication with Message Passing for Sequence Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Contextualized Non-Local Neural Networks for Sequence Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Multi-task Learning over Graph Structures.
CoRR, 2018

Meta-Learning Multi-task Communication.
CoRR, 2018

Meta Multi-Task Learning for Sequence Modeling.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
A model of extended paragraph vector for document categorization and trend analysis.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Adaptive Semantic Compositionality for Sentence Modelling.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Dynamic Compositional Neural Networks over Tree Structure.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Idiom-Aware Compositional Distributed Semantics.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Adversarial Multi-task Learning for Text Classification.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Deep Multi-Task Learning with Shared Memory.
CoRR, 2016

Syntax-based Attention Model for Natural Language Inference.
CoRR, 2016

Modelling Interaction of Sentence Pair with coupled-LSTMs.
CoRR, 2016

An embedding approach for context-aware collaborative recommendation and visualization.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

Recurrent Neural Network for Text Classification with Multi-Task Learning.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Learning Track Representation and Trends for Conference Analytics.
Proceedings of the 49th Hawaii International Conference on System Sciences, 2016

Modelling Interaction of Sentence Pair with Coupled-LSTMs.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Deep Multi-Task Learning with Shared Memory for Text Classification.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Deep Fusion LSTMs for Text Semantic Matching.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Implicit Discourse Relation Detection via a Deep Architecture with Gated Relevance Network.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Discourse Relations Detection via a Mixed Generative-Discriminative Framework.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Topic modeling for conference analytics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Long Short-Term Memory Neural Networks for Chinese Word Segmentation.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

2014
SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

2012
mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012


  Loading...