Yilun Zhao

Orcid: 0000-0002-7470-6124

Affiliations:
  • Yale University, New Haven, CT, USA
  • Zhejiang University, Hangzhou, China (former)


According to our database1, Yilun Zhao authored at least 51 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
<tt>L2CEval</tt>: Evaluating Language-to-Code Generation Capabilities of Large Language Models.
Trans. Assoc. Comput. Linguistics, 2024

ReIFE: Re-evaluating Instruction-Following Evaluation.
CoRR, 2024

Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications.
CoRR, 2024

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation.
CoRR, 2024

Step-Back Profiling: Distilling User History for Personalized Scientific Writing.
CoRR, 2024

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise.
CoRR, 2024

Evaluating LLMs at Detecting Errors in LLM Responses.
CoRR, 2024

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science.
CoRR, 2024

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models.
CoRR, 2024

Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Investigating Data Contamination in Modern Benchmarks for Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Revisiting Automated Evaluation for Long-form Table Question Answering.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

P-FOLIO: Evaluating and Improving Logical Reasoning with Abundant Human-Written Reasoning Chains.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024


FinDVer: Explainable Claim Verification over Long and Hybrid-content Financial Documents.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

KnowledgeFMath: A Knowledge-Intensive Math Reasoning Dataset in Finance Domains.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Financial Documents.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning.
CoRR, 2023

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks.
CoRR, 2023

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data.
CoRR, 2023

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains.
CoRR, 2023

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models.
CoRR, 2023

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?
CoRR, 2023

ODSum: New Benchmarks for Open Domain Multi-Document Summarization.
CoRR, 2023

Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers.
CoRR, 2023

QTSumm: A New Benchmark for Query-Focused Table Summarization.
CoRR, 2023

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.
CoRR, 2023

Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

QTSumm: Query-Focused Summarization over Tabular Data.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

OpenRT: An Open-source Framework for Reasoning Over Tabular Data.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Apparel-Invariant Feature Learning for Person Re-Identification.
IEEE Trans. Multim., 2022

FOLIO: Natural Language Reasoning with First-Order Logic.
CoRR, 2022

FinMath: Injecting a Tree-structured Solver for Question Answering over Financial Reports.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

ReasTAP: Injecting Table Reasoning Skills During Pre-training via Synthetic Reasoning Examples.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

R2D2: Robust Data-to-Text with Replacement Detection.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
MusiCoder: A Universal Music-Acoustic Encoder Based on Transformer.
Proceedings of the MultiMedia Modeling - 27th International Conference, 2021

2020
LAMP: Label Augmented Multimodal Pretraining.
CoRR, 2020

Apparel-invariant Feature Learning for Apparel-changed Person Re-identification.
CoRR, 2020

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers.
CoRR, 2020


  Loading...