David Yarowsky

According to our database1, David Yarowsky authored at least 111 papers between 1992 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


On csauthors.net:


Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!
CoRR, 2024

Evaluating Large Language Models along Dimensions of Language Variation: A Systematik Invesdigatiom uv Cross-lingual Generalization.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

On the Robustness of Cognate Generation Models.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

UniMorph 4.0: Universal Morphology.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Known Words Will Do: Unknown Concept Translation via Lexical Relations.
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages, 2022

Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Sequence Models for Computational Etymology of Borrowings.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

On Pronunciations in Wiktionary: Extraction and Experiments on Multilingual Syllabification and Stress Prediction.
Proceedings of the 14th Workshop on Building and Using Comparable Corpora, 2021

Induced Inflection-Set Keyword Search in Speech.
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, 2020

Computational Etymology and Word Emergence.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Multilingual Dictionary Based Construction of Core Vocabulary.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Measuring the Similarity of Grammatical Gender Systems by Comparing Partitions.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Wiktionary Normalization of Translations and Morphological Information.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Neural Transduction for Multilingual Lexical Translation.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Massively Multilingual Adversarial Speech Recognition.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Modeling Color Terminology Across Thousands of Languages.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Zero-Shot Pronunciation Lexicons for Cross-Language Acoustic Model Transfer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Learning Morphosyntactic Analyzers from the Bible via Iterative Annotation Projection across 26 Languages.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Massively Translingual Compound Analysis and Translation Discovery.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Creating Large-Scale Multilingual Cognate Tables.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Creating a Translation Matrix of the Bible's Names Across 591 Languages.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

UniMorph 2.0: Universal Morphology.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

The CoNLL-SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection.
Proceedings of the CoNLL SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, Brussels, October 31, 2018

Improving Low Resource Machine Translation using Morphological Glosses (Non-archival Extended Abstract).
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, 2018

Marrying Universal Dependencies and Universal Morphology.
Proceedings of the Second Workshop on Universal Dependencies, 2018

Deriving Consensus for Multi-Parallel Corpora: an English Bible Study.
Proceedings of the Eighth International Joint Conference on Natural Language Processing, 2017

Paradigm Completion for Derivational Morphology.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages.
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 2017

A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing.
J. Artif. Intell. Res., 2016

The SIGMORPHON 2016 Shared Task - Morphological Reinflection.
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, 2016

Remote Elicitation of Inflectional Paradigms to Seed Morphological Analysis in Low-Resource Languages.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Very-large Scale Parsing and Normalization of Wiktionary Morphological Paradigms.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages.
Proceedings of the 12th Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, 2016

A Representation Learning Framework for Multi-Source Transfer Parsing.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

A Universal Feature Schema for Rich Morphological Annotation and Fine-Grained Cross-Lingual Part-of-Speech Tagging.
Proceedings of the Systems and Frameworks for Computational Morphology, 2015

Social Media Predictive Analytics.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

A Language-Independent Feature Schema for Inflectional Morphology.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Cross-lingual Dependency Parsing Based on Distributed Representations.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Learning Domain-Specific, L1-Specific Measures of Word Readability.
Trait. Autom. des Langues, 2013

Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.
Proceedings of the IEEE International Conference on Acoustics, 2013

Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Stylometric Analysis of Scientific Articles.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Hierarchical Bayesian Models for Latent Attribute Detection in Social Media.
Proceedings of the Fifth International Conference on Weblogs and Social Media, 2011

NADA: A Robust System for Non-referential Pronoun Detection.
Proceedings of the Anaphora Processing and Applications, 2011

Typed Graph Models for Learning Latent Attributes from Names.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

Word Sense Disambiguation.
Proceedings of the Handbook of Natural Language Processing, Second Edition., 2010

Web N-gram workshop 2010.
SIGIR Forum, 2010

New Tools for Web-Scale N-grams.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Classifying latent user attributes in twitter.
Proceedings of the 2nd International Workshop on Search and Mining User-Generated Contents, 2010

Ranking and Semi-supervised Classification on Large Scale Graphs Using Map-Reduce.
Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, 2009

HLTCOE Approaches to Knowledge Base Population at TAC 2009.
Proceedings of the Second Text Analysis Conference, 2009

Structural, Transitive and Latent Models for Biographic Fact Extraction.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences.
Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 2009

Arabic Cross-Document Coreference Resolution.
Proceedings of the ACL 2009, 2009

Modeling Latent Biographic Attributes in Conversational Genres.
Proceedings of the ACL 2009, 2009

Cross-Document Coreference Resolution: A Key Technology for Learning by Reading.
Proceedings of the Learning by Reading and Learning to Read, 2009

Minimally Supervised Multilingual Taxonomy and Translation Lexicon Induction.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Translating Compounds by Learning Component Gloss Translation Models via Multiple Languages.
Proceedings of the Third International Joint Conference on Natural Language Processing, 2008

Mining and Modeling Relations between Formal and Informal Chinese Phrases from Web Corpora.
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, 2008

Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora.
Proceedings of the ACL 2008, 2008

JHU1 : An Unsupervised Approach to Person Name Disambiguation using Web Snippets.
Proceedings of the 4th International Workshop on Semantic Evaluations, 2007

Resolving and Generating Definite Anaphora by Modeling Hypernymy using Unlabeled Corpora.
Proceedings of the Tenth Conference on Computational Natural Language Learning, 2006

Minimally Supervised Morphological Segmentation with Applications to Machine Translation.
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, 2006

Machine Translation for Languages Lacking Bitext via Multilingual Gloss Transduction.
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers, 2006

Multi-Field Information Extraction and Cross-Document Fusion.
Proceedings of the ACL 2005, 2005

Induction of Fine-Grained Part-of-Speech Taggers via Classifier Combination and Crosslingual Projection.
Proceedings of the Workshop on Building and Using Parallel Texts@ACL 2005, 2005

Exploiting Aggregate Properties of Bilingual Dictionaries For Distinguishing Senses of English Words and Inducing English Sense Clusters.
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 21-26, 2004, 2004

Improving Bitext Word Alignments via Syntax-based Reordering of English.
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain, July 21-26, 2004, 2004

Desparately Seeking Cebuano.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

Minimally Supervised Induction of Grammatical Gender.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

Statistical Machine Translation Using Coercive Two-Level Syntactic Transduction.
Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2003

Unsupervised Personal Name Disambiguation.
Proceedings of the Seventh Conference on Natural Language Learning, 2003

Evaluating sense disambiguation across diverse parameter spaces.
Nat. Lang. Eng., 2002

Combining Classifiers for word sense disambiguation.
Nat. Lang. Eng., 2002

Modeling Consensus: Classifier Combination for Word Sense Disambiguation.
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002

Augmented Mixture Models for Lexical Disambiguation.
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing, 2002

Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages.
Proceedings of the 6th Conference on Natural Language Learning, 2002

Language Independent NER using a Unified Model of Internal and Contextual Evidence.
Proceedings of the 6th Conference on Natural Language Learning, 2002

Bootstrapping a Multilingual Part-of-speech Tagger in One Person-day.
Proceedings of the 6th Conference on Natural Language Learning, 2002

Inducing Information Extraction Systems for New Languages via Cross-language Projection.
Proceedings of the 19th International Conference on Computational Linguistics, 2002

The John Hopkins SENSEVAL-2 System Descriptions.
Proceedings of Second International Workshop on Evaluating Word Sense Disambiguation Systems, 2001

Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora.
Proceedings of the Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics, 2001

Multipath Translation Lexicon Induction via Bridge Languages.
Proceedings of the Language Technologies 2001: The Second Meeting of the North American Chapter of the Association for Computational Linguistics, 2001

Hierarchical Decision Lists for Word Sense Disambiguation.
Comput. Humanit., 2000

Minimally Supervised Morphological Analysis by Multimodal Alignment.
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000

Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking.
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000

Language Independent, Minimally Supervised Induction of Lexical Probabilities.
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, 2000

Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation.
Nat. Lang. Eng., 1999

Taking the load off the conference chairs-towards a digital paper-routing assistant.
Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999

Language Independent Named Entity Recognition Combining Morphological and Contextual Evidence.
Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, 1999

Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation.
Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1999

Discrimination decisions for 100, 000-dimensional spaces.
Ann. Oper. Res., 1995

Unsupervised Word Sense Disambiguation Rivaling Supervised Methods.
Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995

Homograph disambiguation in speech synthesis.
Proceedings of the Second ESCA/IEEE Workshop on Speech Synthesis, 1994

Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French.
Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994

One Sense per Collocation.
Proceedings of the Human Language Technology: Proceedings of a Workshop Held at Plainsboro, 1993

A method for disambiguating word senses in a large corpus.
Comput. Humanit., 1992

One Sense Per Discourse.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Harriman, 1992

A corpus-based synthesizer.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

Word-Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora.
Proceedings of the 14th International Conference on Computational Linguistics, 1992

Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs.
Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, 28 June, 1992
