Wenhu Chen

Orcid: 0009-0002-0947-8388

According to our database1, Wenhu Chen authored at least 124 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation.
Trans. Mach. Learn. Res., 2024

TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks.
Trans. Mach. Learn. Res., 2024

Harnessing Webpage UIs for Text-Rich Visual Understanding.
CoRR, 2024

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks.
CoRR, 2024

T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design.
CoRR, 2024

VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks.
CoRR, 2024

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark.
CoRR, 2024

Foundation Models for Music: A Survey.
CoRR, 2024

LongIns: A Challenging Long-context Instruction-based Exam for LLMs.
CoRR, 2024

LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs.
CoRR, 2024

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation.
CoRR, 2024

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents.
CoRR, 2024

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences.
CoRR, 2024

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation.
CoRR, 2024

GenAI Arena: An Open Evaluation Platform for Generative Models.
CoRR, 2024

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark.
CoRR, 2024

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series.
CoRR, 2024

T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback.
CoRR, 2024

UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models.
CoRR, 2024

MAmmoTH2: Scaling Instructions from the Web.
CoRR, 2024

MANTIS: Interleaved Multi-Image Instruction Tuning.
CoRR, 2024

MuPT: A Generative Symbolic Music Pretrained Transformer.
CoRR, 2024

Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model.
CoRR, 2024

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models.
CoRR, 2024

Long-context LLMs Struggle with Long In-context Learning.
CoRR, 2024

The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis.
CoRR, 2024

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning.
CoRR, 2024

AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks.
CoRR, 2024

Reward Guided Latent Consistency Distillation.
CoRR, 2024

DEEP-ICL: Definition-Enriched Experts for Language Model In-Context Learning.
CoRR, 2024

StructLM: Towards Building Generalist Models for Structured Knowledge Grounding.
CoRR, 2024

ChatMusician: Understanding and Generating Music Intrinsically with LLM.
CoRR, 2024

CIF-Bench: A Chinese Instruction-Following Benchmark for Evaluating the Generalizability of Large Language Models.
CoRR, 2024

ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation.
CoRR, 2024

Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction.
CoRR, 2024

CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark.
CoRR, 2024

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models.
CoRR, 2024

Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation.
CoRR, 2024

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Kosmos-G: Generating Images in Context with Multimodal Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ImagenHub: Standardizing the evaluation of conditional image generation models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Unifying Multimodal Retrieval via Document Screenshot Embedding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

UniIR: Training and Benchmarking Universal Multimodal Information Retrievers.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMMU: A Massive Multi-Discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Instruct-Imagen: Image Generation with Multi-modal Instruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement.
Proceedings of the Findings of the Association for Computational Linguistics, 2024


SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

E2-LLM: Efficient and Extreme Length Extension of Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
DreamEdit: Subject-driven Image Editing.
Trans. Mach. Learn. Res., 2023

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks.
Trans. Mach. Learn. Res., 2023

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models.
CoRR, 2023

Augmenting Black-box LLMs with Medical Textbooks for Clinical Question Answering.
CoRR, 2023

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training.
CoRR, 2023

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models.
CoRR, 2023

EDIS: Entity-Driven Image Search over Multimodal Web Content.
CoRR, 2023

Interactive Natural Language Processing.
CoRR, 2023

Knowledge Discovery from Unstructured Data in Financial Services (KDF) Workshop.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MARBLE: Music Audio Representation Benchmark for Universal Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Subject-driven Text-to-Image Generation via Apprenticeship Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

LyricWhiz: Robust Multilingual Zero-Shot Lyrics Transcription by Whispering to ChatGPT.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Attacking Open-domain Question Answering by Injecting Misinformation.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, 2023

Re-Imagen: Retrieval-Augmented Text-to-Image Generator.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

On the Risk of Misinformation Pollution with Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

EDIS: Entity-Driven Image Search over Multimodal Web Content.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

TheoremQA: A Theorem-driven Question Answering Dataset.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Augmenting Pre-trained Language Models with QA-Memory for Open-Domain Question Answering.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Large Language Models are few(1)-shot Table Reasoners.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Few-shot In-context Learning on Knowledge Base Question Answering.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

DePlot: One-shot visual language reasoning by plot-to-table translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

QA Is the New KR: Question-Answer Pairs as Knowledge Bases.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Explanations from Large Language Models Make Small Reasoners Better.
CoRR, 2022

Controllable Dialogue Simulation with In-context Learning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Accessing Diverse Web Knowledge with Natural Language Interface.
PhD thesis, 2021

ContraQA: Question Answering under Contradicting Contexts.
CoRR, 2021

Meta Module Network for Compositional Visual Reasoning.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Counterfactual Maximum Likelihood Estimation for Training Deep Networks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Local Explanation of Dialogue Response Generation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Dataset for Answering Time-Sensitive Questions.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Unsupervised Multi-hop Question Answering by Question Generation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Open Question Answering over Tables and Text.
Proceedings of the 9th International Conference on Learning Representations, 2021

Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

FinQA: A Dataset of Numerical Reasoning over Financial Data.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Zero-shot Fact Verification by Claim Generation.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

A Systematic Investigation of KB-Text Embedding Alignment at Scale.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Modeling Token-level Uncertainty to Learn Unknown Concepts in SLU via Calibrated Dirichlet Prior RNN.
CoRR, 2020

TabFact: A Large-scale Dataset for Table-based Fact Verification.
Proceedings of the 8th International Conference on Learning Representations, 2020

HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Logic2Text: High-Fidelity Natural Language Generation from Logical Forms.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Violin: A Large-Scale Dataset for Video-and-Language Inference.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Few-Shot NLG with Pre-Trained Language Model.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Logical Natural Language Generation from Open-Domain Tables.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Mining Algorithm Roadmap in Scientific Publications.
Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019

Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Global Textual Relation Embedding for Relational Understanding.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Enhancing the Robustness of Prior Network in Out-of-Distribution Detection.
CoRR, 2018

Approximate Distribution Matching for Sequence-to-Sequence Learning.
CoRR, 2018

Variational Knowledge Graph Reasoning.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Generative Bridging Network for Neural Sequence Prediction.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

XL-NBT: A Cross-lingual Neural Belief Tracking Framework.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Video Captioning via Hierarchical Reinforcement Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Triangular Architecture for Rare Language Translation.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
Neural Sequence Prediction by Coaching.
CoRR, 2017

2016
Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning.
CoRR, 2016

Guided Alignment Training for Topic-Aware Neural Machine Translation.
Proceedings of the 12th Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, 2016

2013
Facial Emotion Recognition Using PHOG and a Hierarchical Expression Model.
Proceedings of the 2013 5th International Conference on Intelligent Networking and Collaborative Systems, 2013


  Loading...