Jimmy Lin

Orcid: 0000-0002-0661-7189

Affiliations:
  • University of Waterloo, David R. Cheriton School of Computer Science
  • Twitter Inc., San Francisco, USA
  • University of Maryland, College Park, Institute for Advanced Computer Studies (UMIACS)
  • Massachusetts Institute of Technology (MIT), Artificial Intelligence Laboratory


According to our database1, Jimmy Lin authored at least 557 papers between 1999 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2022, "For contributions to question answering, information retrieval, and natural language processing".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Toward Best Practices for Training Multilingual Dense Retrieval Models.
ACM Trans. Inf. Syst., March, 2024

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems.
CoRR, 2024

Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes?
CoRR, 2024

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models.
CoRR, 2024

Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism.
CoRR, 2024

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track.
CoRR, 2024

Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models.
CoRR, 2024

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation.
CoRR, 2024

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor.
CoRR, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
CoRR, 2024

UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models.
CoRR, 2024

LLMs Can Patch Up Missing Relevance Judgments in Evaluation.
CoRR, 2024

FLAME: Factuality-Aware Alignment for Large Language Models.
CoRR, 2024

Vector Search with OpenAI Embeddings: Lucene Is All You Need.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Reflections on the Coding Ability of LLMs for Analyzing Market Research Surveys.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

On Backbones and Training Regimes for Dense Retrieval in African Languages.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Fine-Tuning LLaMA for Multi-Stage Text Retrieval.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Robust QA Evaluation via Open LLMs.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024


Can Query Expansion Improve Generalization of Strong Cross-Encoder Rankers?
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

CELI: Simple yet Effective Approach to Enhance Out-of-Domain Generalization of Cross-Encoders.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Toward Automatic Relevance Judgment using Vision-Language Models for Image-Text Retrieval Evaluation.
Proceedings of The First Workshop on Large Language Models for Evaluation in Information Retrieval (LLM4Eval 2024) co-located with 10th International Conference on Online Publishing (SIGIR 2024), 2024

PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA Datasets with Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Unifying Multimodal Retrieval via Document Screenshot Embedding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Towards Automated End-to-End Health Misinformation Free Search with a Large Language Model.
Proceedings of the Advances in Information Retrieval, 2024

EWEK-QA : Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Jointly Modeling Spatio-Temporal Features of Tactile Signals for Action Classification.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse Representations.
ACM Trans. Inf. Syst., October, 2023

A Dense Representation Framework for Lexical and Semantic Matching.
ACM Trans. Inf. Syst., October, 2023

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval.
Trans. Assoc. Comput. Linguistics, 2023

MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages.
Trans. Assoc. Comput. Linguistics, 2023

Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models.
CoRR, 2023

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation.
CoRR, 2023

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models.
CoRR, 2023

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
CoRR, 2023

Searching Dense Representations with Inverted Indexes.
CoRR, 2023

What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations.
CoRR, 2023

End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene.
CoRR, 2023

Generate, Filter, and Fuse: Query Expansion via Multi-Step Keyword Generation for Zero-Shot Neural Rankers.
CoRR, 2023

Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models.
CoRR, 2023

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models.
CoRR, 2023

Unsupervised Chunking with Hierarchical RNN.
CoRR, 2023

Approximating Human-Like Few-shot Learning with GPT-based Compression.
CoRR, 2023

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution.
CoRR, 2023

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard.
CoRR, 2023

Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain.
CoRR, 2023

SmartProbe: A Virtual Moderator for Market Research Surveys.
CoRR, 2023

Zero-Shot Listwise Document Reranking with a Large Language Model.
CoRR, 2023

Improving Out-of-Distribution Generalization of Neural Rerankers with Contextualized Late Interaction.
CoRR, 2023

Which Model Shall I Choose? Cost/Quality Trade-offs for Text Classification Tasks.
CoRR, 2023

TREC2023 AToMiC Overview.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

TREC 2023 - h2oloo in the Product Search Challenge.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

Naverloo @ TREC Deep Learning and Neuclir 2023: As Easy as Zero, One, Two, Three - Cascading Dual Encoders, Mono, Duo, and Listo for Ad-Hoc Retrieval.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

Overview of the TREC 2023 Deep Learning Track.
Proceedings of the Thirty-Second Text REtrieval Conference Proceedings (TREC 2023), 2023

One Blade for One Purpose: Advancing Math Information Retrieval using Hybrid Search.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

AToMiC: An Image/Text Retrieval Test Collection to Support Multimedia Content Creation.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

SLIM: Sparsified Late Interaction for Multi-Vector Retrieval with Inverted Indexes.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

MMEAD: MS MARCO Entity Annotations and Disambiguations.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Tevatron: An Efficient and Flexible Toolkit for Neural Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Enhancing Sparse Retrieval via Unsupervised Learning.
Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 2023

Overview of the CIRAL Track at FIRE 2023: Cross-lingual Information Retrieval for African Languages.
Proceedings of the Working Notes of FIRE 2023, 2023

CIRAL at FIRE 2023: Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023

How Does Generative Retrieval Scale to Millions of Passages?
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Better Quality Pre-training Data and T5 Models for African Languages.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

mAggretriever: A Simple yet Effective Approach to Zero-Shot Multilingual Dense Retrieval.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering.
Proceedings of the Advances in Information Retrieval, 2023

PyGaggle: A Gaggle of Resources for Open-Domain Question Answering.
Proceedings of the Advances in Information Retrieval, 2023

Answer Retrieval for Math Questions Using Structural and Dense Retrieval.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2023

Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

Operator Selection and Ordering in a Pipeline Approach to Efficiency Optimizations for Transformers.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

What the DAAM: Interpreting Stable Diffusion Using Cross Attention.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Evaluating Embedding APIs for Information Retrieval.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023

"Low-Resource" Text Classification: A Parameter-Free Classification Method with Compressors.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Precise Zero-Shot Dense Retrieval without Relevance Labels.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Building a Culture of Reproducibility in Academic Research.
CoRR, 2022

Less is More: Parameter-Free Text Classification with Gzip.
CoRR, 2022

On the Interaction Between Differential Privacy and Gradient Compression in Deep Learning.
CoRR, 2022

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages.
CoRR, 2022

Query Expansion Using Contextual Clue Sampling with Language Models.
CoRR, 2022

Better Than Whitespace: Information Retrieval for Languages without Custom Tokenizers.
CoRR, 2022

What the DAAM: Interpreting Stable Diffusion Using Cross Attention.
CoRR, 2022

Aggretriever: A Simple Approach to Aggregate Textual Representation for Robust Dense Passage Retrieval.
CoRR, 2022

Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers.
CoRR, 2022

Domain Adaptation for Memory-Efficient Dense Retrieval.
CoRR, 2022

Towards Best Practices for Training Multilingual Dense Retrieval Models.
CoRR, 2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval.
CoRR, 2022

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval.
CoRR, 2022

Can Old TREC Collections Reliably Evaluate Modern Neural Retrieval Models?
CoRR, 2022

Aligning the Research and Practice of Building Search Applications: Elasticsearch and Pyserini.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

Overview of the TREC 2022 Deep Learning Track.
Proceedings of the Thirty-First Text REtrieval Conference, 2022

Too Many Relevants: Whither Cranfield Test Collections?
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Flipping the Script: Inverse Information Seeking Dialogues for Market Research.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO V1 and V2.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Another Look at Information Retrieval as Statistical Translation.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Fostering Coopetition While Plugging Leaks: The Design and Implementation of the MS MARCO Leaderboards.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Few-Shot Non-Parametric Learning with Deep Latent Variable Model.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Integration of text and geospatial search for hydrographic datasets using the lucene search library.
Proceedings of the JCDL '22: The ACM/IEEE Joint Conference on Digital Libraries in 2022, Cologne, Germany, June 20, 2022

Temporal Early Exiting for Streaming Speech Commands Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Evaluating Token-Level and Passage-Level Dense Retrieval Models for Math Information Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Improving Precancerous Case Characterization via Transformer-based Ensemble Learning.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7, 2022

SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: EMNLP 2022 - Industry Track, Abu Dhabi, UAE, December 7, 2022

AfriCLIRMatrix: Enabling Cross-Lingual Information Retrieval for African Languages.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

XRICL: Cross-lingual Retrieval-Augmented In-Context Learning for Cross-lingual Text-to-SQL Semantic Parsing.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Cross-lingual Text-to-SQL Semantic Parsing with Representation Mixup.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking.
Proceedings of the Advances in Information Retrieval, 2022

Another Look at DPR: Reproduction of Training and Replication of Retrieval.
Proceedings of the Advances in Information Retrieval, 2022

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback: A Reproducibility Study.
Proceedings of the Advances in Information Retrieval, 2022

REBL: Entity Linking at Scale (prototype).
Proceedings of the Third International Conference on Design of Experimental Search & Information REtrieval Systems, 2022

Applying Structural and Dense Semantic Matching for the ARQMath Lab 2022, CLEF.
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to, 2022

Pseudo-Relevance Feedback with Dense Retrievers in Pyserini.
Proceedings of the 26th Australasian Document Computing Symposium, 2022

VoxelCache: Accelerating Online Mapping in Robotics and 3D Reconstruction Tasks.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
Pretrained Transformers for Text Ranking: BERT and Beyond
Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, ISBN: 978-3-031-02181-7, 2021

Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting.
ACM Trans. Inf. Syst., 2021

A proposed conceptual framework for a representational approach to information retrieval.
SIGIR Forum, 2021

The proper care and feeding of CAMELS: How limited training data affects streamflow prediction.
Environ. Model. Softw., 2021

Fostering Community Engagement through Datathon Events: The Archives Unleashed Experience.
Digit. Humanit. Q., 2021

Sparsifying Sparse Representations for Passage Retrieval by Top-k Masking.
CoRR, 2021

Densifying Sparse Representations for Passage Retrieval by Representational Slicing.
CoRR, 2021

Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation.
CoRR, 2021

Encoder Adaptation of Dense Passage Retrieval for Open-Domain Question Answering.
CoRR, 2021

Cross-Lingual Training with Dense Retrieval for Document Retrieval.
CoRR, 2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval.
CoRR, 2021

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques.
CoRR, 2021

A Replication Study of Dense Passage Retriever.
CoRR, 2021

Investigating the Limitations of the Transformers with Simple Arithmetic Tasks.
CoRR, 2021

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations.
CoRR, 2021

The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models.
CoRR, 2021

Overview of the TREC 2021 Deep Learning Track.
Proceedings of the Thirtieth Text REtrieval Conference, 2021

PYA0: A Python Toolkit for Accessible Math-Aware Search.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Chatty Goose: A Python Framework for Conversational Search.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Pretrained Transformers for Text Ranking: BERT and Beyond.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

In-Batch Negatives for Knowledge Distillation with Tightly-Coupled Teachers for Dense Retrieval.
Proceedings of the 6th Workshop on Representation Learning for NLP, 2021

The Simplest Thing That Can Possibly Work: (Pseudo-)Relevance Feedback via Text Classification.
Proceedings of the ICTIR '21: The 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, 2021

Learning to Rank in the Age of Muppets: Effectiveness-Efficiency Tradeoffs in Multi-Stage Ranking.
Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, 2021

Voice Query Auto Completion.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Simple and Effective Unsupervised Redundancy Elimination to Compress Dense Vectors for Passage Retrieval.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Contextualized Query Embeddings for Conversational Search.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Multi-Task Dense Retrieval via Model Uncertainty Fusion for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Unsupervised Chunking as Syntactic Structure Induction with a Knowledge-Transfer Approach.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Comparing Score Aggregation Approaches for Document Retrieval with Pretrained Transformers.
Proceedings of the Advances in Information Retrieval, 2021

Don't Change Me! User-Controllable Selective Paraphrase Generation.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

BERxiT: Early Exiting for BERT with Better Fine-Tuning and Extension to Regression.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Rescuing historical climate observations to support hydrological research: a case study of solar radiation data.
Proceedings of the DocEng '21: ACM Symposium on Document Engineering 2021, 2021

On the Separation of Logical and Physical Ranking Models for Text Retrieval Applications.
Proceedings of the Second International Conference on Design of Experimental Search & Information REtrieval Systems, 2021

Serverless BM25 Search and BERT Reranking.
Proceedings of the Second International Conference on Design of Experimental Search & Information REtrieval Systems, 2021

Approach Zero and Anserini at the CLEF-2021 ARQMath Track: Applying Substructure Search and BM25 on Operator Tree Path Tokens.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

How Does BERT Rerank Passages? An Attribution Analysis with Information Bottlenecks.
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021

The Art of Abstention: Selective Prediction and Error Regularization for Natural Language Processing.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Exploring Listwise Evidence Reasoning with T5 for Fact Verification.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Semantics of the Unwritten: The Effect of End of Paragraph and Sequence Tokens on Text Generation with GPT2.
Proceedings of the ACL-IJCNLP 2021 Student Research Workshop, 2021

Scientific Claim Verification with VerT5erini.
Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, 2021

Segatron: Segment-Aware Transformer for Language Modeling and Understanding.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
The ubiquity of large graphs and surprising challenges of graph processing: extended survey.
VLDB J., 2020

Navigation-based candidate expansion and pretrained language models for citation recommendation.
Scientometrics, 2020

Building community at distance: a datathon during COVID-19.
Digit. Libr. Perspect., 2020

Inserting Information Bottlenecks for Attribution in Transformers.
CoRR, 2020

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers.
CoRR, 2020

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures.
CoRR, 2020

Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network.
CoRR, 2020

Howl: A Deployed, Open-Source Wake Word Detection System.
CoRR, 2020

To Paraphrase or Not To Paraphrase: User-Controllable Selective Paraphrase Generation.
CoRR, 2020

A Data Scientist's Guide to Streamflow Prediction.
CoRR, 2020

Generalized Optimal Sparse Decision Trees.
CoRR, 2020

Query Reformulation using Query History for Passage Retrieval in Conversational Search.
CoRR, 2020

SegaBERT: Pre-training of Segment-aware BERT for Language Understanding.
CoRR, 2020

Rapidly Bootstrapping a Question Answering Dataset for COVID-19.
CoRR, 2020

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned.
CoRR, 2020

Semantics of the Unwritten.
CoRR, 2020

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models.
CoRR, 2020

TTTTTackling WinoGrande Schemas.
CoRR, 2020

Document Ranking with a Pretrained Sequence-to-Sequence Model.
CoRR, 2020

Rapid Adaptation of BERT for Information Extraction on Domain-Specific Business Documents.
CoRR, 2020

A Prototype of Serverless Lucene.
CoRR, 2020

Distant Supervision for Multi-Stage Fine-Tuning in Retrieval-Based Question Answering.
Proceedings of the WWW '20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, 2020

Capreolus: A Toolkit for End-to-End Neural Ad Hoc Retrieval.
Proceedings of the WSDM '20: The Thirteenth ACM International Conference on Web Search and Data Mining, 2020

H2oloo at TREC 2020: When all you got is a hammer... Deep Learning, Health Misinformation, and Precision Medicine.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

TREC 2020 Notebook: CAsT Track.
Proceedings of the Twenty-Ninth Text REtrieval Conference, 2020

A Lightweight Environment for Learning Experimental IR Research Practices.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Supporting Interoperability Between Open-Source Search Engines with the Common Index File Format.
Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020

Exploring the Limits of Simple Learners in Knowledge Distillation for Document Classification with DocBERT.
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

SimClusters: Community-Based Representations for Heterogeneous Recommendations at Twitter.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

An Open-Source Interface to the Canadian Surface Prediction Archive.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

Content-Based Exploration of Archival Images Using Neural Networks.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

Approximate Nearest Neighbor Search and Lightweight Dense Vector Reranking in Multi-Stage Retrieval Architectures.
Proceedings of the ICTIR '20: The 2020 ACM SIGIR International Conference on the Theory of Information Retrieval, 2020

Generalized and Scalable Optimal Sparse Decision Trees.
Proceedings of the 37th International Conference on Machine Learning, 2020

A Little Bit Is Worse Than None: Ranking with Limited Training Data.
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset.
Proceedings of the First Workshop on Scholarly Document Processing, 2020

Early Exiting BERT for Efficient Document Ranking.
Proceedings of SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, 2020

Cross-Lingual Training of Neural Models for Document Ranking.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Document Ranking with a Pretrained Sequence-to-Sequence Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Inserting Information Bottleneck for Attribution in Transformers.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Cydex: Neural Search Infrastructure for the Scholarly Literature.
Proceedings of the First Workshop on Scholarly Document Processing, 2020

Reproducibility is a Process, Not an Achievement: The Replicability of IR Reproducibility Experiments.
Proceedings of the Advances in Information Retrieval, 2020

Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants.
Proceedings of the Advances in Information Retrieval, 2020

From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance.
Proceedings of the Advances in Information Retrieval, 2020

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Flexible IR Pipelines with Capreolus.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Update Delivery Mechanisms for Prospective Information Needs: A Reproducibility Study.
Proceedings of the CHIIR '20: Conference on Human Information Interaction and Retrieval, 2020

We Could, but Should We?: Ethical Considerations for Providing Access to GeoCities and Other Historical Digital Collections.
Proceedings of the CHIIR '20: Conference on Human Information Interaction and Retrieval, 2020

Evaluating Pretrained Transformer Models for Citation Recommendation.
Proceedings of the 10th International Workshop on Bibliometric-enhanced Information Retrieval co-located with 42nd European Conference on Information Retrieval, 2020

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Showing Your Work Doesn't Always Work.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Two Birds, One Stone: A Simple, Unified Model for Text Generation from Structured and Unstructured Data.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
The neural hype, justified!: a recantation.
SIGIR Forum, 2019

Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits.
CoRR, 2019

Attentive Student Meets Multi-Task Teacher: Improved Knowledge Distillation for Pretrained Models.
CoRR, 2019

What Would Elsa Do? Freezing Layers During Transformer Fine-Tuning.
CoRR, 2019

Cross-Lingual Relevance Transfer for Document Retrieval.
CoRR, 2019

Explicit Pairwise Word Interaction Modeling Improves Pretrained Transformers for English Semantic Similarity Tasks.
CoRR, 2019

Multi-Stage Document Ranking with BERT.
CoRR, 2019

The Performance Envelope of Inverted Indexing on Modern Hardware.
CoRR, 2019

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors.
CoRR, 2019

The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification.
CoRR, 2019

DocBERT: BERT for Document Classification.
CoRR, 2019

Document Expansion by Query Prediction.
CoRR, 2019

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering.
CoRR, 2019

Simple BERT Models for Relation Extraction and Semantic Role Labeling.
CoRR, 2019

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.
CoRR, 2019

Simple Applications of BERT for Ad Hoc Document Retrieval.
CoRR, 2019

Matching Entities Across Different Knowledge Graphs with Graph Embeddings.
CoRR, 2019

Query and Answer Expansion from Conversation History.
Proceedings of the Twenty-Eighth Text REtrieval Conference, 2019

Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Challenges and Opportunities in Understanding Spoken Queries Directed at Modern Entertainment Platforms.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Yelling at Your TV: An Analysis of Speech Recognition Errors and Subsequent User Behavior on Entertainment Systems.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

The Impact of Score Ties on Repeatability in Document Ranking.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

University of Waterloo Docker Images for OSIRRC at SIGIR 2019.
Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Information Retrieval Meets Scalable Text Analytics: Solr Integration with Spark.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Solr Integration in the Anserini Information Retrieval Toolkit.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Overview of the 2019 Open-Source IR Replicability Challenge (OSIRRC 2019).
Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

The SIGIR 2019 Open-Source IR Replicability Challenge (OSIRRC 2019).
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

End-to-End Open-Domain Question Answering with BERTserini.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Detecting Customer Complaint Escalation with Recurrent Neural Networks and Manually-Engineered Features.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Simple Attention-Based Representation Learning for Ranking Short Social Media Posts.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Rethinking Complex Neural Network Architectures for Document Classification.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Scalable Content-Based Analysis of Images in Web Archives with TensorFlow and the Archives Unleashed Toolkit.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

Warclight: A Rails Engine for Web Archive Discovery.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

Building Community and Tools for Analyzing Web Archives Through Datathons.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

The Archives Unleashed Notebook: Madlibs for Jumpstarting Scholarly Exploration of Web Archives.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

The Cost of a WARC: Analyzing Web Archives in the Cloud.
Proceedings of the 19th ACM/IEEE Joint Conference on Digital Libraries, 2019

Universal voice-enabled user interfaces using JavaScript.
Proceedings of the 24th International Conference on Intelligent User Interfaces: Companion, 2019

Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Applying BERT to Document Retrieval with Birch.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Aligning Cross-Lingual Entities with Multi-Aspect Information.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

What Part of the Neural Network Does This? Understanding LSTMs by Measuring and Dissecting Neurons.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Incorporating Contextual and Syntactic Structures Improves Semantic Similarity Modeling.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Honkling: In-Browser Personalization for Ubiquitous Keyword Spotting.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Simple Techniques for Cross-Collection Relevance Feedback.
Proceedings of the Advances in Information Retrieval, 2019

Reproducing and Generalizing Semantic Term Matching in Axiomatic Information Retrieval.
Proceedings of the Advances in Information Retrieval, 2019

Identification and Ranking of Biomedical Informatics Researcher Citation Statistics through a Google Scholar Scraper.
Proceedings of the AMIA 2019, 2019

Natural Language Generation for Effective Knowledge Distillation.
Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP, 2019

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Summarization.
Proceedings of the Encyclopedia of Database Systems, Second Edition, 2018

The Neural Hype and Comparisons Against Weak Baselines.
SIGIR Forum, 2018

Anserini: Reproducible Ranking Baselines Using Lucene.
ACM J. Data Inf. Qual., 2018

Evaluation-as-a-Service for the Computational Sciences: Overview and Outlook.
ACM J. Data Inf. Qual., 2018

Scale Up or Scale Out for Graph Processing?
IEEE Internet Comput., 2018

Streaming Voice Query Recognition using Causal Convolutional Recurrent Neural Networks.
CoRR, 2018

FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks.
CoRR, 2018

Progress and Tradeoffs in Neural Language Models.
CoRR, 2018

JavaScript Convolutional Neural Networks for Keyword Spotting in the Browser: An Experimental Analysis.
CoRR, 2018

Adaptive Pruning of Neural Language Models for Mobile Devices.
CoRR, 2018

Repeatability Corner Cases in Document Ranking: The Impact of Score Ties.
CoRR, 2018

In-Browser Split-Execution Support for Interactive Analytics in the Cloud.
CoRR, 2018

Query Driven Algorithm Selection in Early Stage Retrieval.
Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018

H2oloo at TREC 2018: Cross-Collection Relevance Transfer for the Common Core Track.
Proceedings of the Twenty-Seventh Text REtrieval Conference, 2018

Overview of the TREC 2018 Real-Time Summarization Track.
Proceedings of the Twenty-Seventh Text REtrieval Conference, 2018

Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter.
Proceedings of the 2018 International Conference on Management of Data, 2018

What Do Viewers Say to Their TVs?: An Analysis of Voice Queries to Entertainment Systems.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Update Delivery Mechanisms for Prospective Information Needs: An Analysis of Attention in Mobile Users.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

The Evolution of Content Analysis for Personalized Recommendations at Twitter.
Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018

Pay-Per-Request Deployment of Neural Network Models Using Serverless Architectures.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

Strong Baselines for Simple Question Answering over Knowledge Graphs with and without Neural Networks.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

CNNs for NLP in the Browser: Client-Side Deployment and Visualization Opportunities.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 2018

Multi-Task Learning with Neural Networks for Voice Query Understanding on an Entertainment Platform.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Deep Residual Learning for Small-Footprint Keyword Spotting.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

RecService: Distributed Real-Time Graph Processing at Twitter.
Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

Computing without Servers, V8, Rocket Ships, and Other Batsh*t Crazy Ideas in Data Systems.
Proceedings of the First Biennial Conference on Design of Experimental Search & Information Retrieval Systems, 2018

Farewell Freebase: Migrating the SimpleQuestions Dataset to DBpedia.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Serverless Data Analytics with Flint.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

2017
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing.
Proc. VLDB Endow., 2017

Warcbase: Scalable Analytics Infrastructure for Exploring Web Archives.
ACM Journal on Computing and Cultural Heritage, 2017

The role of index compression in score-at-a-time query evaluation.
Inf. Retr. J., 2017

The Lambda and the Kappa.
IEEE Internet Comput., 2017

In Defense of MapReduce.
IEEE Internet Comput., 2017

Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting.
CoRR, 2017

The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: A User Survey.
CoRR, 2017

An Exploration of Approaches to Integrating Neural Reranking Models in Multi-Stage Ranking Architectures.
CoRR, 2017

Exploring the Effectiveness of Convolutional Neural Networks for Answer Selection in End-to-End Question Answering.
CoRR, 2017

Integrating Lexical and Temporal Signals in Neural Ranking Models for Searching Social Media Streams.
CoRR, 2017

Distant Supervision for Topic Classification of Tweets in Curated Streams.
CoRR, 2017

Efficient and Effective Tail Latency Minimization in Multi-Stage Retrieval Systems.
CoRR, 2017

Ten Blue Links on Mars.
Proceedings of the 26th International Conference on World Wide Web, 2017

Partitioning and Segment Organization Strategies for Real-Time Selective Search on Document Streams.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017

Topic Shifts Between Two US Presidential Administrations.
Proceedings of the 2017 Web Archiving & Digital Libraries Workshop (WADL 2017), 2017

Overview of the TREC 2017 Real-Time Summarization Track.
Proceedings of The Twenty-Sixth Text REtrieval Conference, 2017

In-Browser Interactive SQL Analytics with Afterburner.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Automatically Extracting High-Quality Negative Examples for Answer Selection in Question Answering.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Anserini: Enabling the Use of Lucene for Information Retrieval Research.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

On the Reusability of "Living Labs" Test Collections: : A Case Study of Real-Time Summarization.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Finally, a Downloadable Test Collection of Tweets.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Online In-Situ Interleaved Evaluation of Real-Time Push Notification Systems.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Experiments with Convolutional Neural Network Models for Answer Selection.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Event Detection on Curated Tweet Streams.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017.
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017

Deep learning-based assessment of tumor-associated stroma for diagnosing breast cancer in histopathology images.
Proceedings of the 14th IEEE International Symposium on Biomedical Imaging, 2017

Mining the Temporal Statistics of Query Terms for Searching Social Media Posts.
Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017

Quantization in Append-Only Collections.
Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017

An Exploration of Serverless Architectures for Information Retrieval.
Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017

The Pareto Frontier of Utility Models as a Framework for Evaluating Push Notification Systems.
Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, 2017

Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications.
Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, 2017

An Insight Extraction System on BioMedical Literature with Deep Neural Networks.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Talking to Your TV: Context-Aware Voice Search with Hierarchical Recurrent Neural Networks.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

A Comparison of Nuggets and Clusters for Evaluating Timeline Summaries.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017

Comparative Assessment of Alignment Algorithms for NGS Data: Features, Considerations, Implementations, and Future.
Proceedings of the Algorithms for Next-Generation Sequencing Data, Techniques, 2017

2016
NScale: neighborhood-centric large-scale graph analytics in the cloud.
VLDB J., 2016

GraphJet: Real-Time Content Recommendations at Twitter.
Proc. VLDB Endow., 2016

Sapphire: Querying RDF Data Made Simple.
Proc. VLDB Endow., 2016

The Future of Big Data Is ... JavaScript?
IEEE Internet Comput., 2016

Searching from Mars.
IEEE Internet Comput., 2016

The Effects of Latency Penalties in Evaluating Push Notification Systems.
CoRR, 2016

Afterburner: The Case for In-Browser Analytics.
CoRR, 2016

Dynamic Trade-Off Prediction in Multi-Stage Retrieval Systems.
CoRR, 2016

Overview of the TREC 2016 Real-Time Summarization Track.
Proceedings of The Twenty-Fifth Text REtrieval Conference, 2016

Sampling Strategies and Active Learning for Volume Estimation.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

An Exploration of Evaluation Metrics for Mobile Push Notifications.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Simple Dynamic Emission Strategies for Microblog Filtering.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

A Platform for Streaming Push Notifications to Mobile Assessors.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Interleaved Evaluation for Retrospective Summarization and Prospective Notification on Document Streams.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Burst Detection in Social Media Streams for Tracking Interest Profiles in Real Time.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

UMD-TTIC-UW at SemEval-2016 Task 1: Attention-Based Multi-Perspective Convolutional Neural Networks for Textual Similarity Measurement.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Estimating topical volume in social media streams.
Proceedings of the 31st Annual ACM Symposium on Applied Computing, 2016

Evaluating Search Among Secrets.
Proceedings of the Seventh International Workshop on Evaluating Information Access, 2016

Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement.
Proceedings of the NAACL HLT 2016, 2016

Prizm: A Wireless Access Point for Proxy-Based Web Lifelogging.
Proceedings of the first Workshop on Lifelogging Tools and Applications, 2016

Content Selection and Curation for Web Archiving: The Gatekeepers vs. the Masses.
Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 2016

Desiderata for Exploratory Search Interfaces to Web Archives in Support of Scholarly Activities.
Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 2016

Temporal Query Expansion Using a Continuous Hidden Markov Model.
Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, 2016

Retrievability in API-Based "Evaluation as a Service".
Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, 2016

Rank-at-a-Time Query Processing.
Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, 2016

Total Recall: Blue Sky on Mars.
Proceedings of the 2016 ACM on International Conference on the Theory of Information Retrieval, 2016

Compressing and Decoding Term Statistics Time Series.
Proceedings of the Advances in Information Retrieval, 2016

Toward Reproducible Baselines: The Open-Source IR Reproducibility Challenge.
Proceedings of the Advances in Information Retrieval, 2016

Exploring and Discovering Archive-It Collections with Warcbase.
Proceedings of the 11th Annual International Conference of the Alliance of Digital Humanities Organizations, 2016

Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016

Optimizing Nugget Annotations with Active Learning.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016

Discovering key moments in social media streams.
Proceedings of the 13th IEEE Annual Consumer Communications & Networking Conference, 2016

In Vacuo and In Situ Evaluation of SIMD Codecs.
Proceedings of the 21st Australasian Document Computing Symposium, 2016

Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems.
Proceedings of the 21st Australasian Document Computing Symposium, 2016

2015
Gappy Pattern Matching on GPUs for On-Demand Extraction of Hierarchical Translation Grammars.
Trans. Assoc. Comput. Linguistics, 2015

Report on the Evaluation-as-a-Service (EaaS) Expert Workshop.
SIGIR Forum, 2015

Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR).
SIGIR Forum, 2015

Is Big Data a Transient Problem?
IEEE Internet Comput., 2015

Evaluation-as-a-Service: Overview and Outlook.
CoRR, 2015

Learning to Discover Key Moments in Social Media Streams.
CoRR, 2015

Scaling Down Distributed Infrastructure on Wimpy Machines for Personal Web Archiving.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Overview of the TREC-2015 Microblog Track.
Proceedings of The Twenty-Fourth Text REtrieval Conference, 2015

Assessor Differences and User Preferences in Tweet Timeline Generation.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR).
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Developing an Open-Source Bibliometric Ranking Website Using Google Scholar Citation Profiles for Researchers in the Field of Biomedical Informatics.
Proceedings of the MEDINFO 2015: eHealth-enabled Health, 2015

Identifying Duplicate and Contradictory Information in Wikipedia.
Proceedings of the 15th ACM/IEEE-CE Joint Conference on Digital Libraries, 2015

The Sum of All Human Knowledge in Your Pocket: Full-Text Searchable Wikipedia on a Raspberry Pi.
Proceedings of the 15th ACM/IEEE-CE Joint Conference on Digital Libraries, 2015

The Feasibility of Brute Force Scans for Real-Time Tweet Search.
Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015

Anytime Ranking for Impact-Ordered Indexes.
Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015

Building a Self-Contained Search Engine in the Browser.
Proceedings of the 2015 International Conference on The Theory of Information Retrieval, 2015

Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Reproducible Experiments on Lexical and Temporal Feedback for Tweet Search.
Proceedings of the Advances in Information Retrieval, 2015

2014
Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval.
ACM Trans. Inf. Syst., 2014

Runtime Optimizations for Tree-Based Machine Learning Models.
IEEE Trans. Knowl. Data Eng., 2014

NScale: Neighborhood-centric Analytics on Large Graphs.
Proc. VLDB Endow., 2014

Real-Time Twitter Recommendation: Online Motif Detection in Large Dynamic Graphs.
Proc. VLDB Endow., 2014

Summingbird: A Framework for Integrating Batch and Online MapReduce Computations.
Proc. VLDB Endow., 2014

On the Feasibility and Implications of Self-Contained Search Engines in the Browser.
CoRR, 2014

Learning to efficiently rank on big data.
Proceedings of the 23rd International World Wide Web Conference, 2014

Information network or social network?: the structure of the twitter follow graph.
Proceedings of the 23rd International World Wide Web Conference, 2014

Infrastructure for supporting exploration and discovery in web archives.
Proceedings of the 23rd International World Wide Web Conference, 2014

Infrastructure support for evaluation as a service.
Proceedings of the 23rd International World Wide Web Conference, 2014

Overview of the TREC-2014 Microblog Track.
Proceedings of The Twenty-Third Text REtrieval Conference, 2014

On run diversity in Evaluation as a Service.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Old dogs are great at new tricks: column stores for ir prototyping.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Temporal feedback for tweet search with non-parametric density estimation.
Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2014

Visual analytics of MOOCs at maryland.
Proceedings of the First (2014) ACM Conference on Learning @ Scale, 2014

Using visualizations to monitor changes and harvest insights from a global-scale logging infrastructure at Twitter.
Proceedings of the 9th IEEE Conference on Visual Analytics Science and Technology, 2014

Partitioning strategies for spatio-textual similarity join.
Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, 2014

Optimization Techniques for "Scaling Down" Hadoop on Multi-Core, Shared-Memory Systems.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

The Impact of Future Term Statistics in Real-Time Tweet Search.
Proceedings of the Advances in Information Retrieval, 2014

Column Stores as an IR Prototyping Tool.
Proceedings of the Advances in Information Retrieval, 2014

Supporting "Distant Reading" for Web Archives.
Proceedings of the 9th Annual International Conference of the Alliance of Digital Humanities Organizations, 2014

Cumulative Citation Recommendation: A Feature-Aware Comparison of Approaches.
Proceedings of the 25th International Workshop on Database and Expert Systems Applications, 2014

Do recommendations matter?: news recommendation in real life.
Proceedings of the Computer Supported Cooperative Work, 2014

2013
Fast candidate generation for real-time tweet search with bloom filter chains.
ACM Trans. Inf. Syst., 2013

Evaluation as a service for information retrieval.
SIGIR Forum, 2013

Hone: "Scaling Down" Hadoop on Shared-Memory Systems.
Proc. VLDB Endow., 2013

Document vector representations for feature extraction in multi-stage document ranking.
Inf. Retr., 2013

Fast, Incremental Inverted Indexing in Main Memory for Web-Scale Collections
CoRR, 2013

Monoidify! Monoids as a Design Principle for Efficient MapReduce Algorithms
CoRR, 2013

Mapreduce is Good Enough?If All You Have is a Hammer, Throw Away Everything That's Not a Nail!
Big Data, 2013

WTF: the who to follow service at Twitter.
Proceedings of the 22nd International World Wide Web Conference, 2013

Towards Efficient Large-Scale Feature-Rich Statistical Machine Translation.
Proceedings of the Eighth Workshop on Statistical Machine Translation, 2013

Overview of the TREC-2013 Microblog Track.
Proceedings of The Twenty-Second Text REtrieval Conference, 2013

CWI and TU Delft Notebook TREC 2013: Contextual Suggestion, Federated Web Search, KBA, and Web Tracks.
Proceedings of The Twenty-Second Text REtrieval Conference, 2013

Fast data in the era of big data: Twitter's real-time related query suggestion architecture.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Flat vs. hierarchical phrase-based translation models for cross-language information retrieval.
Proceedings of the 36th International ACM SIGIR conference on research and development in Information Retrieval, 2013

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures.
Proceedings of the 36th International ACM SIGIR conference on research and development in Information Retrieval, 2013

Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Dynamic memory allocation policies for postings in real-time Twitter search.
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2013

Visualizing the "Pulse" of World Cities on Twitter.
Proceedings of the Seventh International Conference on Weblogs and Social Media, 2013

Training Efficient Tree-Based Models for Document Ranking.
Proceedings of the Advances in Information Retrieval, 2013

A month in the life of a production news recommender system.
Proceedings of the 2013 workshop on Living labs for information retrieval evaluation, 2013

Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Scaling big data mining infrastructure: the twitter experience.
SIGKDD Explor., 2012

The Unified Logging Infrastructure for Data Analytics at Twitter.
Proc. VLDB Endow., 2012

Runtime Optimizations for Prediction with Tree-Based Models
CoRR, 2012

A Study of "Churn" in Tweets and Real-Time Search Queries (Extended Version)
CoRR, 2012

Overview of the TREC-2012 Microblog Track.
Proceedings of The Twenty-First Text REtrieval Conference, 2012

Large-scale machine learning at twitter.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2012

Looking inside the box: context-sensitive translation for cross-language information retrieval.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

Twanchor text: a preliminary study of the value of tweets as anchor text.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

On building a reusable Twitter corpus.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Evaluating Real-Time Search over Tweets.
Proceedings of the Sixth International Conference on Weblogs and Social Media, 2012

A Study of "Churn" in Tweets and Real-Time Search Queries.
Proceedings of the Sixth International Conference on Weblogs and Social Media, 2012

Earlybird: Real-Time Search at Twitter.
Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), 2012

Combining Statistical Translation Techniques for Cross-Language Information Retrieval.
Proceedings of the COLING 2012, 2012

Fast candidate generation for two-phase document ranking: postings list intersection with bloom filters.
Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012

2011
Special Issue on Cloud Computing.
J. Parallel Distributed Comput., 2011

Overview of the TREC 2011 Microblog Track.
Proceedings of The Twentieth Text REtrieval Conference, 2011

A cascade ranking model for efficient ranked retrieval.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

No free lunch: brute force vs. locality-sensitive hashing for cross-lingual pairwise similarity.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

Cross-corpus relevance projection.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

Pseudo test collections for learning web search ranking functions.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

Smoothing techniques for adaptive online language models: topic tracking in tweet streams.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

In-depth accounts and passing mentions in the news: connecting readers to the context of a news event.
Proceedings of the iConference 2011, 2011

Automatic management of partitioned, replicated search services.
Proceedings of the ACM Symposium on Cloud Computing in conjunction with SOSP 2011, 2011

When close enough is good enough: approximate positional indexes for efficient ranked retrieval.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

2010
Data-Intensive Text Processing with MapReduce
Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers, ISBN: 978-3-031-02136-7, 2010

UMD and USC/ISI: TREC 2010 Web Track Experiments with Ivory.
Proceedings of The Nineteenth Text REtrieval Conference, 2010

Learning to efficiently rank.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

Putting the User in the Loop: Interactive Maximal Marginal Relevance for Query-Focused Summarization.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Design patterns for efficient graph algorithms in MapReduce.
Proceedings of the Eighth Workshop on Mining and Learning with Graphs, 2010

Scaling Populations of a Genetic Algorithm for Job Shop Scheduling Problems Using MapReduce.
Proceedings of the Cloud Computing, Second International Conference, 2010

Ranking under temporal constraints.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009
Summarization.
Proceedings of the Encyclopedia of Database Systems, 2009

Computational linguistics for metadata building (CLiMB): using text mining for the automatic identification, categorization, and disambiguation of subject terms for image metadata.
Multim. Tools Appl., 2009

A cost-effective lexical acquisition process for large-scale thesaurus translation.
Lang. Resour. Evaluation, 2009

Special Issue of the Journal of Parallel and Distributed Computing: Cloud Computing.
J. Parallel Distributed Comput., 2009

Elements of a computational model for multi-party discourse: The turn-taking behavior of Supreme Court justices.
J. Assoc. Inf. Sci. Technol., 2009

Modeling actions of PubMed users with <i>n</i>-gram language models.
Inf. Retr., 2009

Where is the Cloud? Geography, Economics, Environment, and Jurisdiction in Cloud Computing.
First Monday, 2009

Is searching full text more effective than searching abstracts?
BMC Bioinform., 2009

Of Ivory and Smurfs: Loxodontan MapReduce Experiments for Web Search.
Proceedings of The Eighteenth Text REtrieval Conference, 2009

The Curse of Zipf and Limits to Parallelization: An Look at the Stragglers Problem in MapReduce.
Proceedings of the 7th Workshop on Large-Scale Distributed Systems for Information Retrieval, 2009

Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

Data Intensive Text Processing with MapReduce.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Users' adjustments to unsuccessful queries in biomedical search.
Proceedings of the 2009 Joint International Conference on Digital Libraries, 2009

You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories.
Proceedings of the Third International Conference on Weblogs and Social Media, 2009

2008
Toward automatic facet analysis and need negotiation: Lessons from mediated search.
ACM Trans. Inf. Syst., 2008

Single-document and multi-document summarization techniques for email threads using sentence compression.
Inf. Process. Manag., 2008

Navigating information spaces: A case study of related article search in PubMed.
Inf. Process. Manag., 2008

PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval.
BMC Bioinform., 2008

Fast, Easy, and Cheap: Construction of Statistical Machine Translation Models with MapReduce.
Proceedings of the Third Workshop on Statistical Machine Translation, 2008

Multiple Alternative Sentence Compressions and Word-Pair Antonymy for Automatic Text Summarization and Recognizing Textual Entailment.
Proceedings of the First Text Analysis Conference, 2008

How do users find things with PubMed?: towards automatic utility evaluation with user simulations.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Computational linguistics for metadata building.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2008

Scalable Language Processing Algorithms for the Masses: A Case Study in Computing Word Co-occurrence Matrices with MapReduce.
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008

Pairwise Document Similarity in Large Collections with MapReduce.
Proceedings of the ACL 2008, 2008

2007
An exploration of the principles underlying redundancy-based factoid question answering.
ACM Trans. Inf. Syst., 2007

Overview of the TREC 2006 ciQA task.
SIGIR Forum, 2007

Presentation schemes for component analysis in IR experiments.
SIGIR Forum, 2007

Syntactic sentence compression in the biomedical domain: facilitating access to related articles.
Inf. Retr., 2007

Multi-candidate reduction: Sentence compression as a tool for document summarization tasks.
Inf. Process. Manag., 2007

User simulations for evaluating answers to question series.
Inf. Process. Manag., 2007

Answering Clinical Questions with Knowledge-Based and Statistical Techniques.
Comput. Linguistics, 2007

PubMed related articles: a probabilistic topic-based model for content similarity.
BMC Bioinform., 2007

TREC 2007 ciQA Task: University of Maryland.
Proceedings of The Sixteenth Text REtrieval Conference, 2007

Overview of the TREC 2007 Question Answering Track.
Proceedings of The Sixteenth Text REtrieval Conference, 2007

Deconstructing nuggets: the stability and reliability of complex question answering evaluation.
Proceedings of the SIGIR 2007: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2007

Is Question Answering Better than Information Retrieval? Towards a Task-Based Evaluation Framework for Question Series.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Concept Disambiguation for Improved Subject Access Using Multiple Knowledge Sources.
Proceedings of the Workshop on Language Technology for Cultural Heritage Data, 2007

Semantic Clustering of Answers to Clinical Questions.
Proceedings of the AMIA 2007, 2007

Different Structures for Evaluating Answers to Complex Questions: Pyramids Won't Topple, and Neither Will Human Assessors.
Proceedings of the ACL 2007, 2007

2006
Building a reusable test collection for question answering.
J. Assoc. Inf. Sci. Technol., 2006

Methods for automatically evaluating answers to complex questions.
Inf. Retr., 2006

TREC 2006 at Maryland: Blog, Enterprise, Legal and QA Tracks.
Proceedings of the Fifteenth Text REtrieval Conference, 2006

Overview of the TREC 2006 Question Answering Track 99.
Proceedings of the Fifteenth Text REtrieval Conference, 2006

Action modeling: language models that predict query behavior.
Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

Exploring the limits of single-iteration clarification dialogs.
Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

The role of knowledge in conceptual retrieval: a study in the domain of clinical medicine.
Proceedings of the SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

Will Pyramids Built of Nuggets Topple Over?.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

Leveraging Recurrent Phrase Structure in Large-scale Ontology Translation.
Proceedings of the 11th Annual conference of the European Association for Machine Translation, 2006

Generative Content Models for Structural Analysis of Medical Abstracts.
Proceedings of the Workshop on Linking Natural Language and Biology, 2006

Identification of user sessions with hierarchical agglomerative clustering.
Proceedings of the Information Realities: Shaping the Digital Future for All, 2006

Evaluation of PICO as a Knowledge Representation for Clinical Questions.
Proceedings of the AMIA 2006, 2006

Leveraging Reusability: Cost-Effective Lexical Acquisition for Large-Scale Ontology Translation.
Proceedings of the ACL 2006, 2006

The Role of Information Retrieval in Answering Complex Questions.
Proceedings of the ACL 2006, 2006

Answer Extraction, Semantic Clustering, and Extractive Summarization for Clinical Question Answering.
Proceedings of the ACL 2006, 2006

2005
A Menagerie of Tracks at Maryland: HARD, Enterprise, QA, and Genomics, Oh My!
Proceedings of the Fourteenth Text REtrieval Conference, 2005

Fusion of Knowledge-Intensive and Statistical Approaches for Retrieving and Annotating Textual Genomics Documents.
Proceedings of the Fourteenth Text REtrieval Conference, 2005

Assessing the term independence assumption in blind relevance feedback.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Evaluation of resources for question answering evaluation.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005

Automatically Evaluating Answers to Definition Questions.
Proceedings of the HLT/EMNLP 2005, 2005

"Bag of Words" is not enough for Strength of Evidence Classification.
Proceedings of the AMIA 2005, 2005

Evaluating Summaries and Answers: Two Sides of the Same Coin?
Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, 2005

2004
Event structure and the encoding of arguments: the syntax of the Mandarin and English verb phrase.
PhD thesis, 2004

Answering Multiple Questions on a Topic From Heterogeneous Resources.
Proceedings of the Thirteenth Text REtrieval Conference, 2004

Answering Questions About Moving Objects in Videos.
Proceedings of the New Directions in Question Answering, 2004

Viewing the Web as a Virtual Database for Question Answering.
Proceedings of the New Directions in Question Answering, 2004

A Computational Framework for Non-Lexicalist Semantics.
Proceedings of the Student Research Workshop at HLT-NAACL 2004, 2004

Answering Definition Questions Using Multiple Knowledge Sources.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2004

2003
Integrating Web-based and Corpus-based Techniques for Question Answering.
Proceedings of The Twelfth Text REtrieval Conference, 2003

Quantitative evaluation of passage retrieval algorithms for question answering.
Proceedings of the SIGIR 2003: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 28, 2003

Answering Questions about Moving Objects in Surveillance Videos.
Proceedings of the New Directions in Question Answering, 2003

Sticky notes for the semantic web.
Proceedings of the 8th International Conference on Intelligent User Interfaces, 2003

What Makes a Good Answer? The Role of Context in Question Answering.
Proceedings of the Human-Computer Interaction INTERACT '03: IFIP TC13 International Conference on Human-Computer Interaction, 2003

START: A Framework for Facilitating E-Rulemaking.
Proceedings of the 2003 Annual National Conference on Digital Government Research, 2003

Better Public Policy Through Natural Language Information Access.
Proceedings of the 2003 Annual National Conference on Digital Government Research, 2003

Question answering from the web using knowledge annotation and knowledge mining techniques.
Proceedings of the 2003 ACM CIKM International Conference on Information and Knowledge Management, 2003

The role of context in question answering systems.
Proceedings of the Extended abstracts of the 2003 Conference on Human Factors in Computing Systems, 2003

Extracting Structural Paraphrases from Aligned Monolingual Corpora.
Proceedings of the Second International Workshop on Paraphrasing, 2003

2002
Extracting Answers from the Web Using Data Annotation and Knowledge Mining Techniques.
Proceedings of The Eleventh Text REtrieval Conference, 2002

Web question answering: is more always better?.
Proceedings of the SIGIR 2002: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2002

Omnibase: Uniform Access to Heterogeneous Data for Question Answering.
Proceedings of the Natural Language Processing and Information Systems, 2002

The START Multimedia Information System: Current Technology and Future Directions.
Proceedings of the MIS 2002, International Workshop on Multimedia Information Systems, October 10, 2002

The Web as a Resource for Question Answering: Perspectives and Challenges.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Natural Language Annotations for the Semantic Web.
Proceedings of the On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002 Irvine, California, USA, October 30, 2002

Annotating the Semantic Web Using Natural Language.
Proceedings of the 2nd Workshop on NLP and XML, 2002

2001
The Role of a Natural Language Conversational Interface in Online Sales: A Case Study.
Int. J. Speech Technol., 2001

Data-Intensive Question Answering.
Proceedings of The Tenth Text REtrieval Conference, 2001

Gathering Knowledge for a Question Answering System from Heterogeneous Information Sources.
Proceedings of the Workshop on Human Language Technology and Knowledge Management@ACL 2001, 2001

2000
Comparative Evaluation of a Natural Language Dialog Based System and a Menu Driven System for Information Access: a Case Study.
Proceedings of the Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications), 2000

1999
Integrating Web Resources and Lexicons into a Natural Language Query System.
Proceedings of the IEEE International Conference on Multimedia Computing and Systems, 1999


  Loading...