Xinyu Zhang

Orcid: 0009-0009-0756-8110

Affiliations:
  • University of Waterloo, Canada


According to our database1, Xinyu Zhang authored at least 57 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models.
ACM Trans. Inf. Syst., September, 2024

Toward Best Practices for Training Multilingual Dense Retrieval Models.
ACM Trans. Inf. Syst., March, 2024

Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models.
CoRR, 2024

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation.
CoRR, 2024

Debatrix: Multi-dimensinal Debate Judge with Iterative Chronological Analysis Based on LLM.
CoRR, 2024


Multi-Objective Forward Reasoning and Multi-Reward Backward Refinement for Product Review Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages.
Trans. Assoc. Comput. Linguistics, 2023

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation.
CoRR, 2023

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models.
CoRR, 2023

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations.
CoRR, 2023

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models.
CoRR, 2023

Optimizing Factual Accuracy in Text Generation through Dynamic Knowledge Selection.
CoRR, 2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution.
CoRR, 2023

Zero-Shot Listwise Document Reranking with a Large Language Model.
CoRR, 2023

WebBrain: Learning to Generate Factually Correct Articles for Queries by Grounding on Large Web Corpus.
CoRR, 2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction.
CoRR, 2023

CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023

Argue with Me Tersely: Towards Sentence-Level Counter-Argument Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Hi-ArG: Exploring the Integration of Hierarchical Argumentation Graphs in Language Pretraining.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Rethinking Label Smoothing on Multi-Hop Question Answering.
Proceedings of the Chinese Computational Linguistics - 22nd China National Conference, 2023

Hence, Socrates is mortal: A Benchmark for Natural Language Syllogistic Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Evaluating Embedding APIs for Information Retrieval.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023

2022
Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages.
CoRR, 2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers.
CoRR, 2022

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?
CoRR, 2022

Towards Best Practices for Training Multilingual Dense Retrieval Models.
CoRR, 2022

KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models.
CoRR, 2022

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

Webformer: Pre-training with Web Pages for Information Retrieval.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Triple-Fact Retriever: An explainable reasoning retrieval model for multi-hop QA problem.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking.
Proceedings of the Advances in Information Retrieval, 2022

Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

Hyperlink-induced Pre-training for Passage Retrieval in Open-domain Question Answering.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Towards More Effective and Economic Sparsely-Activated Model.
CoRR, 2021

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline.
CoRR, 2021

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker.
CoRR, 2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval.
CoRR, 2021

Early Exiting with Ensemble Internal Classifiers.
CoRR, 2021

Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator.
CoRR, 2021

Answer Complex Questions: Path Ranker Is All You Need.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers.
Proceedings of the Advances in Information Retrieval, 2021

Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

Pre-training for Ad-hoc Retrieval: Hyperlink is Also You Need.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval.
Proceedings of the WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, 2020

H2oloo at TREC 2020: When all you got is a hammer... Deep Learning, Health Misinformation, and Precision Medicine.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

A Little Bit Is Worse Than None: Ranking with Limited Training Data.
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

Flexible IR Pipelines with Capreolus.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

2017
RUCIR at the NTCIR-13 STC-2 Task.
Proceedings of the 13th NTCIR Conference, 2017


  Loading...