Colin Cherry

According to our database1, Colin Cherry authored at least 104 papers between 1961 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Overestimation in LLM Evaluation: A Controlled Large-Scale Study on Data Contamination's Impact on Machine Translation.
CoRR, January, 2025

To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation.
Trans. Assoc. Comput. Linguistics, 2024

On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation.
CoRR, 2024

Don't Throw Away Data: Better Sequence Knowledge Distillation.
CoRR, 2024

Translating Step-by-Step: Decomposing the Translation Process for Improved Translation Quality of Long-Form Texts.
Proceedings of the Ninth Conference on Machine Translation, 2024

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Barriers to Effective Evaluation of Simultaneous Interpretation.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, 2024

Quality-Aware Translation Models: Efficient Generation and Quality Estimation in a Single Model.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Quality Control at Your Fingertips: Quality-Aware Translation Models.
CoRR, 2023

PaLM 2 Technical Report.
CoRR, 2023

The unreasonable effectiveness of few-shot learning for machine translation.
CoRR, 2023

The Unreasonable Effectiveness of Few-shot Learning for Machine Translation.
Proceedings of the International Conference on Machine Learning, 2023

Prompting PaLM for Translation: Assessing Strategies and Performance.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Data Scaling Laws in NMT: The Effect of Noise and Architecture.
CoRR, 2022

mSLAM: Massively multilingual joint pre-training for speech and text.
CoRR, 2022

Exploring the Benefits and Limitations of Multilinguality for Non-autoregressive Machine Translation.
Proceedings of the Seventh Conference on Machine Translation, 2022

Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

XTREME-S: Evaluating Cross-lingual Speech Representations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Data Scaling Laws in NMT: The Effect of Noise and Architecture.
Proceedings of the International Conference on Machine Learning, 2022

Scaling Laws for Neural Machine Translation.
Proceedings of the Tenth International Conference on Learning Representations, 2022

A Natural Diet: Towards Improving Naturalness of Machine Translation Output.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Can Multilinguality benefit Non-autoregressive Machine Translation?
CoRR, 2021

Assessing Reference-Free Peer Evaluation for Machine Translation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Inverted Projection for Robust Speech Translation.
Proceedings of the 18th International Conference on Spoken Language Translation, 2021

Subtitle Translation as Markup Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sentence Boundary Augmentation for Neural Machine Translation Robustness.
Proceedings of the IEEE International Conference on Acoustics, 2021

Human-Paraphrased References Improve Neural Machine Translation.
Proceedings of the Fifth Conference on Machine Translation, 2020

Re-translation versus Streaming for Simultaneous Translation.
Proceedings of the 17th International Conference on Spoken Language Translation, 2020

Shaping the Narrative Arc: Information-Theoretic Collaborative DialoguePaper type: Technical Paper.
Proceedings of the Eleventh International Conference on Computational Creativity, 2020

Re-Translation Strategies for Long Form, Simultaneous, Spoken Language Translation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Inference Strategies for Machine Translation with Conditional Masking.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Simultaneous Translation.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, 2020

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges.
CoRR, 2019

Thinking Slow about Latency Evaluation for Simultaneous Machine Translation.
CoRR, 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue.
CoRR, 2019

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Efficient Sequence Labeling with Actor-Critic Training.
Proceedings of the Advances in Artificial Intelligence, 2019

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Revisiting Character-Based Neural Machine Translation with Capacity and Compression.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

End-to-End Multi-View Networks for Text Classification.
CoRR, 2017

NRC Machine Translation System for WMT 2017.
Proceedings of the Second Conference on Machine Translation, 2017

A Challenge Set Approach to Evaluating Machine Translation.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Cost Weighting for Neural Machine Translation Domain Adaptation.
Proceedings of the First Workshop on Neural Machine Translation, 2017

NRC Russian-English Machine Translation System for WMT 2016.
Proceedings of the First Conference on Machine Translation, 2016

SemEval-2016 Task 6: Detecting Stance in Tweets.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Integrating Morphological Desegmentation into Phrase-based Decoding.
Proceedings of the NAACL HLT 2016, 2016

An Empirical Evaluation of Noise Contrastive Estimation for the Neural Network Joint Model of Translation.
Proceedings of the NAACL HLT 2016, 2016

A Dataset for Detecting Stance in Tweets.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Bilingual Methods for Adaptive Training Data Selection for Machine Translation.
Proceedings of the 12th Conferences of the Association for Machine Translation in the Americas: MT Researchers' Track, 2016

What Matters Most in Morphologically Segmented SMT Models?
Proceedings of the Ninth Workshop on Syntax, 2015

Morpho-syntactic Regularities in Continuous Word Representations: A multilingual study.
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015

Inflection Generation as Discriminative String Transduction.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

The Unreasonable Effectiveness of Word Representations for Twitter Named Entity Recognition.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

NRC: Infused Phrase Vectors for Named Entity Recognition in Twitter.
Proceedings of the Workshop on Noisy User-generated Text, 2015

A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

NRC-Canada-2014: Detecting Aspects and Sentiment in Customer Reviews.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

Lattice Desegmentation for Statistical Machine Translation.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

A Graph-Partitioning Framework for Aligning Hierarchical Topic Structures to Presentations.
IEEE Trans. Speech Audio Process., 2013

Detecting concept relations in clinical text: Insights from a state-of-the-art model.
J. Biomed. Informatics, 2013

À la Recherche du Temps Perdu: extracting temporal relations from medical text in the 2012 i2b2 NLP challenge.
J. Am. Medical Informatics Assoc., 2013

Reversing Morphological Tokenization in English-to-Arabic SMT.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Improved Reordering for Phrase-Based Translation using Sparse Features.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013

Regularized Minimum Error Rate Training.
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013

On Hierarchical Re-ordering and Permutation Parsing for Phrase-based Decoding.
Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012

MSR SPLAT, a language analysis toolkit.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Batch Tuning Strategies for Statistical Machine Translation.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Paraphrasing for Style.
Proceedings of the COLING 2012, 2012

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010.
J. Am. Medical Informatics Assoc., 2011

Indexing Spoken Documents with Hierarchical Semantic Structures: Semantic Tree-to-string Alignment Models.
Proceedings of the Fifth International Joint Conference on Natural Language Processing, 2011

Data-Driven Response Generation in Social Media.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011

Lexically-Triggered Hidden Markov Models for Clinical Document Coding.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 2011

Joint Training of Dependency Parsing Filters through Latent Support Vector Machines.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

Statistical Machine Translation Philipp Koehn (University of Edinburgh) Cambridge University Press, 2010, xii+433 pp; ISBN 978-0-521-87415-1, $60.00.
Comput. Linguistics, 2010

Unsupervised Modeling of Twitter Conversations.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Integrating Joint n-gram Features into a Discriminative Training Framework.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

Imposing Hierarchical Browsing Structures onto Spoken Documents.
Proceedings of the COLING 2010, 2010

Fast and Accurate Arc Filtering for Dependency Parsing.
Proceedings of the COLING 2010, 2010

Unsupervised Morphological Segmentation with Log-Linear Models.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

On the Syllabification of Phonemes.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Cohesive Constraints in A Beam Search Phrase-based Decoder.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009

Discriminative Substring Decoding for Transliteration.
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 2009

NEWS 2009 Machine Transliteration Shared Task System Description: Transliteration with Letter-to-Phoneme Technology.
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, 2009

A global model for joint lemmatization and part-of-speech prediction.
Proceedings of the ACL 2009, 2009

Discriminative, Syntactic Language Modeling through Latent SVMs.
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers, 2008

Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion.
Proceedings of the ACL 2008, 2008

Cohesive Phrase-Based Decoding for Statistical Machine Translation.
Proceedings of the ACL 2008, 2008

Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion.
Proceedings of the ACL 2008, 2008

Inversion Transduction Grammar for Joint Phrasal Translation Modeling.
Proceedings of the NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation, 2007

A Comparison of Syntactically Motivated Word Alignment Spaces.
Proceedings of the EACL 2006, 2006

Improved Large Margin Dependency Parsing via Local Constraints and Laplacian Regularization.
Proceedings of the Tenth Conference on Computational Natural Language Learning, 2006

Biomedical Term Recognition with the Perceptron HMM Algorithm.
Proceedings of the Workshop on Linking Natural Language and Biology, 2006

Soft Syntactic Constraints for Word Alignment through Discriminative Training.
Proceedings of the ACL 2006, 2006

An Expectation Maximization Approach to Pronoun Resolution.
Proceedings of the Ninth Conference on Computational Natural Language Learning, 2005

Dependency Treelet Translation: Syntactically Informed Phrasal SMT.
Proceedings of the ACL 2005, 2005

ProAlign: Shared Task System Description.
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond, 2003

Word Alignment with Cohesion Constraint.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2003

A Probability Model to Improve Word Alignment.
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003

A Self-modifying Intranet Search Engine based on Past Queries.
Proceedings of the Fifth IASTED International Conference Internet and Multimedia Systems and Applications (IMSA 2001), 2001

Celebration of the 25th Anniversary of Norbert Wiener's Cybernetics.
IEEE Trans. Syst. Man Cybern., 1975

Review: Book Review.
Comput. J., 1963

A New Type of Computer for Problems in Propositional Logic, with Greatly Reduced Scanning Procedures
Inf. Control., September, 1961
