2025
QSpell 250K: A Large-Scale, Practical Dataset for Chinese Search Query Spell Correction.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

2024
mR<sup>2</sup>AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA.
CoRR, 2024

Best Practices for Distilling Large Language Models into BERT for Web Search Ranking.
CoRR, 2024

CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios.
Proceedings of the 1st Workshop on Multimodal Search and Recommendations (MMSR 2024) co-located with 33rd ACM International Conference on Information and Knowledge Management (CIKM 2024), 2024

Enhancing Asymmetric Web Search through Question-Answer Generation and Ranking.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Span Confusion is All You Need for Chinese Spelling Correction.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

WebCiteS: Attributed Query-Focused Summarization on Chinese Web Search Results with Citations.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Event-driven Real-time Retrieval in Web Search.
CoRR, 2023

A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition.
CoRR, 2023

T2Ranking: A Large-scale Chinese Benchmark for Passage Ranking.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

VTLayout: A Multi-Modal Approach for Video Text Layout.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Improving Query Correction Using Pre-train Language Model In Search Engines.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Characterizing the Impacts of Instances on Robustness.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Event-Centric Query Expansion in Web Search.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023

A Confidence-based Partial Label Learning Model for Crowd-Annotated Named Entity Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Knowledge Transfer in Incremental Learning for Multilingual Neural Machine Translation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

On the Universal Adversarial Perturbations for Efficient Data-free Adversarial Detection.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

DSRM: Boost Textual Adversarial Training with Distribution Shift Risk Minimization.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding.
CoRR, 2022

Entropy-Based Vocabulary Substitution for Incremental Learning in Multilingual Neural Machine Translation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022