Scott Yih

Affiliations:
  • Meta AI


According to our database1, Scott Yih authored at least 154 papers between 1997 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference.
CoRR, 2024

CRAG - Comprehensive RAG Benchmark.
CoRR, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
CoRR, 2024

FLAME: Factuality-Aware Alignment for Large Language Models.
CoRR, 2024

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM.
CoRR, 2024

Reliable, Adaptable, and Attributable Language Models with Retrieval.
CoRR, 2024

REPLUG: Retrieval-Augmented Black-Box Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Trusting Your Evidence: Hallucinate Less with Context-aware Decoding.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

In-Context Pretraining: Language Modeling Beyond Document Boundaries.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

RA-DIT: Retrieval-Augmented Dual Instruction Tuning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Altogether: Image Captioning via Re-aligning Alt-text.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

MoDE: CLIP Data Experts via Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Instruction-tuned Language Models are Better Knowledge Learners.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Efficient Open Domain Multi-Hop Question Answering with Few-Shot Data Synthesis.
CoRR, 2023

Large Language Model Programs.
CoRR, 2023

VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation.
CoRR, 2023

Coder Reviewer Reranking for Code Generation.
Proceedings of the International Conference on Machine Learning, 2023

Retrieval-Augmented Multimodal Language Modeling.
Proceedings of the International Conference on Machine Learning, 2023

LEVER: Learning to Verify Language-to-Code Generation with Execution.
Proceedings of the International Conference on Machine Learning, 2023

DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation.
Proceedings of the International Conference on Machine Learning, 2023

InCoder: A Generative Model for Code Infilling and Synthesis.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Adapting Pretrained Text-to-Text Models for Long Text Sequences.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

How to Train Your Dragon: Diverse Augmentation Towards Generalizable Dense Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Reimagining Retrieval Augmented Language Models for Answering Queries.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

One Embedder, Any Task: Instruction-Finetuned Text Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Nonparametric Masked Language Modeling.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Task-aware Retrieval with Instructions.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences.
CoRR, 2022

Structured Prompt Tuning.
CoRR, 2022

QUASER: Question Answering with Scalable Extractive Rationalization.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

BiT: Robustly Binarized Multi-distilled Transformer.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Autoregressive Search Engines: Generating Substrings as Document Identifiers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Simple Local Attentions Remain Competitive for Long-Context Tasks.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Domain-matched Pre-training Tasks for Dense Retrieval.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Boosted Dense Retriever.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Improving Passage Retrieval with Zero-Shot Question Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

On Continual Model Refinement in Out-of-Distribution Data Streams.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus.
CoRR, 2021

On Unifying Misinformation Detection.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

RECONSIDER: Improved Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval.
Proceedings of the 9th International Conference on Learning Representations, 2021

On the Influence of Masking Policies in Intermediate Pre-training.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Joint Verification and Reranking for Open Fact Checking Over Tables.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Multi-Task Retrieval for Knowledge-Intensive Tasks.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Studying Strategically: Learning to Mask for Closed-book QA.
CoRR, 2020

Unified Open-Domain Question Answering with Structured and Unstructured Knowledge.
CoRR, 2020

RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering.
CoRR, 2020

Language Models as Fact Checkers?
CoRR, 2020

Dense Passage Retrieval for Open-Domain Question Answering.
CoRR, 2020


Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Abductive Commonsense Reasoning.
Proceedings of the 8th International Conference on Learning Representations, 2020

An Imitation Game for Learning Semantic Parsers from User Interaction.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Blockwise Self-Attention for Long Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Unsupervised Question Decomposition for Question Answering.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Efficient One-Pass End-to-End Entity Linking for Questions.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Dense Passage Retrieval for Open-Domain Question Answering.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Open-Domain Question Answering.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, 2020

2019
Be Consistent! Improving Procedural Text Comprehension using Label Consistency.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

FlowQA: Grasping Flow in History for Conversational Machine Comprehension.
Proceedings of the 7th International Conference on Learning Representations, 2019

Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Everything Happens for a Reason: Discovering the Purpose of Actions in Procedural Text.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

QUAREL: A Dataset and Models for Answering Questions about Qualitative Relationships.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Natural Language to Structured Query Generation via Meta-Learning.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Reasoning about Actions and State Changes by Injecting Commonsense Knowledge.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Dissecting Contextual Word Embeddings: Architecture and Representation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

QuAC: Question Answering in Context.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

A Knowledge-Grounded Neural Conversation Model.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Cross-Sentence N-ary Relation Extraction with Graph LSTMs.
Trans. Assoc. Comput. Linguistics, 2017

Maximum Margin Reward Networks for Learning from Explicit and Implicit Supervision.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

NLP for Precision Medicine.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Search-based Neural Structured Learning for Sequential Question Answering.
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

2016
Basic Reasoning with Tensor Product Representations.
CoRR, 2016

Reasoning in Vector Space: An Exploratory Study of Question Answering.
Proceedings of the 4th International Conference on Learning Representations, 2016

Answering Complicated Question Intents Expressed in Decomposed Question Sequences.
CoRR, 2016

Table Cell Search for Question Answering.
Proceedings of the 25th International Conference on World Wide Web, 2016

Question Answering with Knowledge Base, Web and Beyond.
Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016

Story Cloze Evaluator: Vector Space Representation Evaluation by Predicting What Happens Next.
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, 2016

Learning from Explicit and Implicit Supervision Jointly For Algebra Word Problems.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

The Value of Semantic Parse Labeling for Knowledge Base Question Answering.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

2015
Embedding Entities and Relations for Learning and Inference in Knowledge Bases.
Proceedings of the 3rd International Conference on Learning Representations, 2015

Questions vs. Queries in Informational Search Tasks.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Open Domain Question Answering via Semantic Enrichment.
Proceedings of the 24th International Conference on World Wide Web, 2015

Deep Learning and Continuous Representations for Natural Language Processing.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

WikiQA: A Challenge Dataset for Open-Domain Question Answering.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality.
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, 2015

2014
Learning Multi-Relational Semantics Using Neural-Embedding Models.
CoRR, 2014

Joint semantic utterance classification and slot filling with recursive neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Semantic Parsing for Single-Relation Question Answering.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Learning Continuous Phrase Representations for Translation Modeling.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction.
Trans. Assoc. Comput. Linguistics, 2013

Learning Semantic Representations for the Phrase Translation Model.
CoRR, 2013

Combining Heterogeneous Models for Measuring Relational Similarity.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Linguistic Regularities in Continuous Space Word Representations.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Animacy Detection with Voting Models.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Multi-Relational Latent Semantic Analysis.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Question Answering Using Enhanced Lexical Semantic Models.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

2012
Measuring Word Relatedness Using Heterogeneous Vector Space Models.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

MSR SPLAT, a language analysis toolkit.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Polarity Inducing Latent Semantic Analysis.
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2012

2011
Clickthrough-based latent semantic models for web search.
Proceedings of the Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2011

Domain Adaptation with Ensemble of Feature Groups.
Proceedings of the IJCAI 2011, 2011

Learning Discriminative Projections for Text Similarity Measures.
Proceedings of the Fifteenth Conference on Computational Natural Language Learning, 2011

2010
Adaptive near-duplicate detection via similarity learning.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

Translingual Document Representations from Discriminative Projections.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

2009
Learning Term-weighting Functions for Similarity Measures.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

2008
The Importance of Syntactic Parsing and Inference in Semantic Role Labeling.
Comput. Linguistics, 2008

Partitioned logistic regression for spam filtering.
Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008

Personalized Spam Filtering for Gray Mail.
Proceedings of the CEAS 2008, 2008

Consistent phrase relevance measures.
Proceedings of the 2nd International Workshop on Data Mining and Audience Intelligence for Advertising, 2008

2007
Site-Independent Template-Block Detection.
Proceedings of the Knowledge Discovery in Databases: PKDD 2007, 2007

Raising the baseline for high-precision text classifiers.
Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007

Multi-Document Summarization by Maximizing Informative Content-Words.
Proceedings of the IJCAI 2007, 2007

Improve Spam Filtering by Detecting Gray Mail.
Proceedings of the CEAS 2007, 2007

Improving Similarity Measures for Short Segments of Text.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007

2006
Finding advertising keywords on web pages.
Proceedings of the 15th international conference on World Wide Web, 2006

Automatic Semantic Role Labeling.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

Learning at Low False Positive Rates.
Proceedings of the CEAS 2006, 2006

Online Discriminative Spam Filter Training.
Proceedings of the CEAS 2006, 2006

Improved Discriminative Bilingual Word Alignment.
Proceedings of the ACL 2006, 2006

2005
Learning and Inference for Information Extraction
PhD thesis, 2005

Demonstrating an Interactive Semantic Role Labeling System.
Proceedings of the HLT/EMNLP 2005, 2005

Learning and Inference over Constrained Output.
Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

The Necessity of Syntactic Parsing for Semantic Role Labeling.
Proceedings of the IJCAI-05, Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, UK, July 30, 2005

Integer linear programming inference for conditional random fields.
Proceedings of the Machine Learning, 2005

Generalized Inference with Multiple Semantic Role Labeling Systems.
Proceedings of the Ninth Conference on Computational Natural Language Learning, 2005

2004
Mining Online Deal Forums for Hot Deals.
Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2004), 2004

A Linear Programming Formulation for Global Inference in Natural Language Tasks.
Proceedings of the Eighth Conference on Computational Natural Language Learning, 2004

Semantic Role Labeling Via Generalized Inference Over Classifiers.
Proceedings of the Eighth Conference on Computational Natural Language Learning, 2004

Semantic Role Labeling Via Integer Linear Programming Inference.
Proceedings of the COLING 2004, 2004

2002
Question-Answering via Enhanced Understanding of Questions.
Proceedings of The Eleventh Text REtrieval Conference, 2002

Probabilistic Reasoning for Entity & Relation Recognition.
Proceedings of the 19th International Conference on Computational Linguistics, 2002

2001
Learning Components for A Question-Answering System.
Proceedings of The Tenth Text REtrieval Conference, 2001

Relational Learning via Propositional Algorithms: An Information Extraction Case Study.
Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001

1997
Template-Based Information Mining from HTML Documents.
Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, 1997


  Loading...