François Yvon

Orcid: 0000-0002-7972-7442

  • LIMSI - Computer Science Laboratory for Mechanics and Engineering Sciences, Orsay, France

According to our database1, François Yvon authored at least 249 papers between 1995 and 2025.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



How Transliterations Improve Crosslingual Alignment.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Unlike "Likely", "Unlike" is Unlikely: BPE-based Segmentation hurts Morphological Derivations in LLMs.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Towards the Machine Translation of Scientific Neologisms.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

Translating scientific abstracts in the bio-medical domain with structure-aware models.
Comput. Speech Lang., 2024

Investigating Length Issues in Document-level Machine Translation.
CoRR, 2024

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment.
CoRR, 2024

Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models.
CoRR, 2024

Optimizing example selection for retrieval-augmented machine translation with translation memories.
CoRR, 2024

Lessons from the Trenches on Reproducible Evaluation of Language Models.
CoRR, 2024

CroissantLLM: A Truly Bilingual French-English Language Model.
CoRR, 2024

À propos des difficultés de traduire automatiquement de longs documents.
Proceedings of the Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, 2024

Vers la traduction automatique des néologismes scientifiques.
Proceedings of the Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, 2024

Optimiser le choix des exemples pour la traduction automatique augmentée par des mémoires de traduction.
Proceedings of the Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, 2024

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Translate your Own: a Post-Editing Experiment in the NLP domain.
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), 2024

GlotScript: A Resource and Tool for Low Resource Writing System Identification.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

MaskLID: Code-Switching Language Identification through Iterative Masking.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024

Production automatique de gloses interlinéaires à travers un modèle probabiliste exploitant des alignements.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023

MaTOS: Traduction automatique pour la science ouverte.
Proceedings of the Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques", 2023

LISN @ SIGMORPHON 2023 Shared Task on Interlinear Glossing.
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023

Structural generalization in COGS: Supertagging is (almost) all you need.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Towards Multilingual Interlinear Morphological Glossing.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

GlotLID: Language Identification for Low-Resource Languages.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Towards Example-Based NMT with Multi-Levenshtein Transformers.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

Integrating Translation Memories into Non-Autoregressive Machine Translation.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Joint Word and Morpheme Segmentation with Bayesian Non-Parametric Models.
Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, 2023

Assessing Word Importance Using Models Trained for Semantic Tasks.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

BiSync: A Bilingual Editor for Synchronized Monolingual Texts.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

Language Report French.
Proceedings of the European Language Equality, 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

Non-Autoregressive Machine Translation with Translation Memories.
CoRR, 2022

Modèle-s bayés-ien-s pour la segment-ation à deux niveau-x faible-ment super-vis-é-e (Bayesian models for weakly supervised two-level segmentation ).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Ré-ordonnancement via programmation dynamique pour l'adaptation cross-lingue d'un analyseur en dépendances (Sentence reordering via dynamic programming for cross-lingual dependency parsing ).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022

Latent Group Dropout for Multilingual and Multidomain Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

Evaluating Subtitle Segmentation for End-to-end Generation Systems.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Joint Generation of Captions and Subtitles with Dual Decoding.
Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Bilingual Synchronization: Restoring Translational Relationships with Editing Operations.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Multi-Domain Adaptation in Neural Machine Translation with Dynamic Sampling Strategies.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

Analyzing Gender Translation Errors to Identify Information Flows between the Encoder and Decoder of a NMT System.
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2022

Weakly Supervised Word Segmentation for Computational Language Documentation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Graph Neural Networks for Multiparallel Word Alignment.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Revisiting Multi-Domain Machine Translation.
Trans. Assoc. Comput. Linguistics, 2021

LISN @ WMT 2021.
Proceedings of the Sixth Conference on Machine Translation, 2021

Biais de genre dans un système de traduction automatiqueneuronale : une étude préliminaire (Gender Bias in Neural Translation : a preliminary study ).
Proceedings of the Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2021

Vers la production automatique de sous-titres adaptés à l'affichage (Towards automatic adapted monolingual captioning).
Proceedings of the Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2021

Optimizing Word Alignments with Better Subword Tokenization.
Proceedings of the 18th Biennial Machine Translation Summit - Volume 1: Research Track, 2021

Toward Genre Adapted Closed Captioning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Graph Algorithms for Multiparallel Word Alignment.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

One Source, Two Targets: Challenges and Rewards of Dual Decoding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Screening Gender Transfer in Neural Machine Translation.
Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2021

Can You Traducir This? Machine Translation for Code-Switched Input.
Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, 2021

Transformers in Natural Language Processing.
Proceedings of the Human-Centered Artificial Intelligence, 2021

Priming Neural Machine Translation.
Proceedings of the Fifth Conference on Machine Translation, 2020

A Study of Residual Adapters for Multi-Domain Neural Machine Translation.
Proceedings of the Fifth Conference on Machine Translation, 2020

LIMSI @ WMT 2020.
Proceedings of the Fifth Conference on Machine Translation, 2020

Simplification automatique de texte dans un contexte de faibles ressources (Automatic Text Simplification : Approaching the Problem in Low Resource Settings for French).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

SimAlign: High Quality Word Alignments without Parallel Training Data using Static and Contextualized Embeddings.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Non-linear n-best List Reranking with Few Features.
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers, 2020

Generative latent neural models for automatic word alignment.
Proceedings of the 14th Conference of the Association for Machine Translation in the Americas, 2020

<i>Quality Estimation for Machine Translation</i>.
Comput. Linguistics, 2019

How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Generic and Specialized Word Embeddings for Multi-Domain Machine Translation.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Neural Baselines for Word Alignment.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Controlling Utterance Length in NMT-based Word Segmentation with Attention.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Measuring text readability with machine comprehension: a pilot study.
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, 2019

Reassessing the proper place of man and machine in translation: a pre-translation scenario.
Mach. Transl., 2018

Using Monolingual Data in Neural Machine Translation: a Systematic Study.
Proceedings of the Third Conference on Machine Translation: Research Papers, 2018

The WMT'18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English.
Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018

Divergences entre annotations dans le projet Universal Dependencies et leur impact sur l'évaluation des performance d'étiquetage morpho-syntaxique (Evaluating Annotation Divergences in the UD Project).
Proceedings of the Actes de la Conférence TALN. CORIA-TALN-RJC 2018 - Volume 1, 2018

Évaluation morphologique pour la traduction automatique : adaptation au français (Morphological Evaluation for Machine Translation : Adaptation to French).
Proceedings of the Actes de la Conférence, 2018

Automatically Selecting the Best Dependency Annotation Design with Dynamic Oracles.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

Exploiting Dynamic Oracles to Train Projective Dependency Parsers on Non-Projective Trees.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Unsupervised Word Segmentation from Speech with Attention.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Bayesian Models for Unit Discovery on a Very Low Resource Language.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Fixing Translation Divergences in Parallel Corpora for Neural MT.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

Quantifying training challenges of dependency parsers.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

Unsupervised Learning of Word Segmentation: Does Tone Matter?
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2018

Learning Morphological Normalization for Translation from and into Morphologically Rich Languages.
Prague Bull. Math. Linguistics, 2017

A comparison of discriminative training criteria for continuous space translation models.
Mach. Transl., 2017

Reassessing the value of resources for cross-lingual transfer of POS tagging models.
Lang. Resour. Evaluation, 2017

The QT21 Combined Machine Translation System for English to Latvian.
Proceedings of the Second Conference on Machine Translation, 2017

Evaluating the morphological competence of Machine Translation Systems.
Proceedings of the Second Conference on Machine Translation, 2017

Proceedings of the Second Conference on Machine Translation, 2017

Word Representations in Factored Neural Machine Translation.
Proceedings of the Second Conference on Machine Translation, 2017

Normalisation automatique du vocabulaire source pour traduire depuis une langue à morphologie riche (Learning Morphological Normalization for Translation from Morphologically Rich Languages).
Proceedings of the Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles, 2017

Adaptation au domaine pour l'analyse morpho-syntaxique (Domain Adaptation for PoS tagging).
Proceedings of the Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles. Orléans, France, June 26-30, 2017, Volume 2, 2017

Learning the Structure of Variable-Order CRFs: a finite-state perspective.
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017

Don't Stop Me Now! Using Global Dynamic Oracles to Correct Training Biases of Transition-Based Dependency Parsers.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

LIMSI$@$CoNLL'17: UD Shared Task.
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 2017

Apprentissage discriminant de modèles neuronaux pour la traduction automatique.
Trait. Autom. des Langues, 2016

Reordering space design in statistical machine translation.
Lang. Resour. Evaluation, 2016

LIMSI's Contribution to the WMT'16 Biomedical Translation Task.
Proceedings of the First Conference on Machine Translation, 2016

LIMSI$@$WMT'16: Machine Translation of News.
Proceedings of the First Conference on Machine Translation, 2016

Lecture bilingue augmentée par des alignements multi-niveaux (Augmenting bilingual reading with alignment information).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 5 : Démonstrations, 2016

Apprentissage d'analyseur en dépendances cross-lingue par projection partielle de dépendances (Cross-lingual learning of dependency parsers from partially projected dependencies).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 2 : TALN (Articles longs), 2016

Ne nous arrêtons pas en si bon chemin : améliorations de l'apprentissage global d'analyseurs en dépendances par transition (Don't Stop Me Now ! Improved Update Strategies for Global Training of Transition-Based).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 2 : TALN (Articles longs), 2016

TransRead: Designing a Bilingual Reading Experience with Machine Translation Technologies.
Proceedings of the Demonstrations Session, 2016

Frustratingly Easy Cross-Lingual Transfer for Transition-Based Dependency Parsing.
Proceedings of the NAACL HLT 2016, 2016

Novel elicitation and annotation schemes for sentential and sub-sentential alignments of bitexts.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Cross-lingual and Supervised Models for Morphosyntactic Annotation: a Comparison on Romanian.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Proceedings of the 13th International Conference on Spoken Language Translation, 2016

Two-Step MT: Predicting Target Morphology.
Proceedings of the 13th International Conference on Spoken Language Translation, 2016

Preliminary Experiments on Unsupervised Word Discovery in Mboshi.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Parallel Sentence Compression.
Proceedings of the COLING 2016, 2016

Zero-resource Dependency Parsing: Boosting Delexicalized Cross-lingual Transfer with Linguistic Knowledge.
Proceedings of the COLING 2016, 2016

Why Predicting Post-Edition is so Hard? Failure Analysis of LIMSI Submission to the APE Shared Task.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

LIMSI$@$WMT'15 : Translation Task.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

The KIT-LIMSI Translation System for WMT 2015.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

Oublier ce qu'on sait, pour mieux apprendre ce qu'on ne sait pas : une étude sur les contraintes de type dans les modèles CRF.
Proceedings of the Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2015

Apprentissage par imitation pour l'étiquetage de séquences : vers une formalisation des méthodes d'étiquetage easy-first.
Proceedings of the Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2015

Apprentissage discriminant des modèles continus de traduction.
Proceedings of the Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2015

Morphology-aware alignments for translation to and from a synthetic language.
Proceedings of the 12th International Workshop on Spoken Language Translation: Papers, 2015

Structured prediction for speaker identification in TV series.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A Discriminative Training Procedure for Continuous Translation Models.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Issues in Analogical Inference Over Sequences of Symbols: A Case Study on Proper Name Transliteration.
Proceedings of the Computational Approaches to Analogical Reasoning: Current Trends, 2014

Traduire la parole : le cas des TED Talks.
Trait. Autom. des Langues, 2014

Maximum-entropy word alignment and posterior-based phrase extraction for machine translation.
Mach. Transl., 2014

LIMSI Submission for WMT'14 QE Task.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

LIMSI $@$ WMT'14 Medical Translation Task.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

The KIT-LIMSI Translation System for WMT 2014.
Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014

Cross-Lingual POS Tagging through Ambiguous Learning: First Experiments (Apprentissage partiellement supervisé d'un étiqueteur morpho-syntaxique par transfert cross-lingue) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

(Much) Faster Construction of SMT Phrase Tables from Large-scale Parallel Corpora (Construction (très) rapide de tables de traduction à partir de grands bi-textes) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

Towards a More Efficient Development of Statistical Machine Translation Systems (Vers un développement plus efficace des systèmes de traduction statistique : un peu de vert dans un monde de BLEU) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

Topic Adaptation for the Automatic Translation of News Articles (Adaptation thématique pour la traduction automatique de dépêches de presse) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

Comparison of scheduling methods for the learning rate of neural network language models (Modèles de langue neuronaux: une comparaison de plusieurs stratégies d'apprentissage) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014

A Corpus of Machine Translation Errors Extracted from Translation Students Exercises.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Rule-based Reordering Space in Statistical Machine Translation.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

LIMSI English-French speech translation system.
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2014, 2014

Incremental development of statistical machine translation systems.
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Discriminative adaptation of continuous space translation models.
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Cross-Lingual Part-of-Speech Tagging through Ambiguous Learning.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Combining techniques from different NN-based language models for machine translation.
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track, 2014

Lattice BLEU oracles in machine translation.
ACM Trans. Speech Lang. Process., 2013

Structured Output Layer Neural Network Language Models for Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

Traitement automatique des entités nommées en arabe : détection et traduction.
Trait. Autom. des Langues, 2013

Oracle decoding as a new way to analyze phrase-based machine translation.
Mach. Transl., 2013

Quality estimation for machine translation: some lessons learned.
Mach. Transl., 2013

Generalizing sampling-based multilingual alignment.
Mach. Transl., 2013

Fast Large-Margin Learning for Statistical Machine Translation.
Int. J. Comput. Linguistics Appl., 2013

LIMSI Submission for the WMT'13 Quality Estimation Task: an Experiment with N-Gram Posteriors.
Proceedings of the Eighth Workshop on Statistical Machine Translation, 2013

Proceedings of the Eighth Workshop on Statistical Machine Translation, 2013

A corpus of post-edited translations (Un corpus d'erreurs de traduction) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2013

A fully discriminative training framework for Statistical Machine Translation (Un cadre d'apprentissage intégralement discriminant pour la traduction statistique) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2013

Design and Analysis of a Large Corpus of Post-Edited Translations: Quality Estimation, Failure Analysis and the Variability of Post-Edition.
Proceedings of Machine Translation Summit XIV: Papers, 2013

Improving bilingual sub-sentential alignment by sampling-based transpotting.
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, 2013

Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Discriminative training of a phoneme confusion model for a dynamic lexicon in ASR.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Trait. Autom. des Langues, 2012

Non-Linear Models for Confidence Estimation.
Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012

Proceedings of the Seventh Workshop on Statistical Machine Translation, 2012

Alignement sous-phrastique hiérarchique avec Anymalign (Hierarchical Sub-Sentential Alignment with Anymalign) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

Repérage des entités nommées pour l'arabe : adaptation non-supervisée et combinaison de systèmes (Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012

WSD for n-best reranking and local language modeling in SMT.
Proceedings of the Sixth Workshop on Syntax, 2012

Measuring the Influence of Long Range Dependencies with Neural Network Language Models.
Proceedings of the Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT, 2012

Continuous Space Translation Models with Neural Networks.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Towards contextual adaptation for any-text translation.
Proceedings of the 2012 International Workshop on Spoken Language Translation, 2012

Hierarchical Sub-sentential Alignment with Anymalign.
Proceedings of the 16th Annual conference of the European Association for Machine Translation, 2012

Computing Lattice BLEU Oracle Scores for Machine Translation.
Proceedings of the EACL 2012, 2012

Aligning Bilingual Literary Works: a Pilot Study.
Proceedings of the Workshop on Computational Linguistics for Literature, 2012

Probabilistic Models: An Introduction.
Proceedings of the Textual Information Access: Statistical Models, 2012

Statistical Methods for Machine Translation.
Proceedings of the Textual Information Access: Statistical Models, 2012

Ncode: an Open Source Bilingual N-gram SMT Toolkit.
Prague Bull. Math. Linguistics, 2011

Filtering artificial texts with statistical machine learning techniques.
Lang. Resour. Evaluation, 2011

Text segmentation: A topic modeling perspective.
Inf. Process. Manag., 2011

Designing an Improved Discriminative Word Aligner.
Int. J. Comput. Linguistics Appl., 2011

From n-gram-based to CRF-based Translation Models.
Proceedings of the Sixth Workshop on Statistical Machine Translation, 2011

Estimation d'un modèle de traduction à partir d'alignements mot-à-mot non-déterministes (Estimating a translation model from non-deterministic word-to-word alignments).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011

Généralisation de l'alignement sous-phrastique par échantillonnage (Generalization of sub-sentential alignment by sampling).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011

How good are your phrases? assessing phrase quality with single class classification.
Proceedings of the 2011 International Workshop on Spoken Language Translation, 2011

LIMSI's experiments in domain adaptation for IWSLT11.
Proceedings of the 2011 International Workshop on Spoken Language Translation, 2011

Advances on spoken language translation in the Quaero program.
Proceedings of the 2011 International Workshop on Spoken Language Translation, 2011

Large Vocabulary SOUL Neural Network Language Models.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Structured Output Layer neural network language model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Measuring the Confusability of Pronunciations in Speech Recognition.
Proceedings of the Finite-State Methods and Natural Language Processing, 2011

Discriminative Weighted Alignment Matrices For Statistical Machine Translation.
Proceedings of the 15th Annual conference of the European Association for Machine Translation, 2011

Minimum Error Rate Training Semiring.
Proceedings of the 15th Annual conference of the European Association for Machine Translation, 2011

Two Ways to Use a Noisy Parallel News Corpus for Improving Statistical Machine Translation.
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, 2011

Lexical Micro-adaptation in Statistical Machine Translation.
Trait. Autom. des Langues, 2010

Rewriting the orthography of SMS messages.
Nat. Lang. Eng., 2010

Factored bilingual <i>n</i>-gram language models for statistical machine translation.
Mach. Transl., 2010

Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labeling.
IEEE J. Sel. Top. Signal Process., 2010

LIMSI's Statistical Translation Systems for WMT'10.
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, 2010

Recueil et analyse d'un corpus écologique de corrections orthographiques extrait des révisions de Wikipédia.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2010

Contrastive Lexical Evaluation of Machine Translation.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

The pay-offs of preprocessing for German-English statistical machine translation.
Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

Proceedings of the 2010 International Workshop on Spoken Language Translation, 2010

Assessing Phrase-Based Translation Models with Oracle Decoding.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

Training Continuous Space Language Models: Some Practical Issues.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

Improving Reordering with Linguistically Informed Bilingual n-grams.
Proceedings of the COLING 2010, 2010

Local lexical adaptation in Machine Translation through triangulation: SMT helping SMT.
Proceedings of the COLING 2010, 2010

Refining Word Alignment with Discriminative Training.
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers, 2010

Practical Very Large Scale CRFs.
Proceedings of the ACL 2010, 2010

Selecting features with L1 regularization in Conditional Random Fields.
Trait. Autom. des Langues, 2009

Efficient Learning of Sparse Conditional Random Fields for Supervised Sequence Labelling
CoRR, 2009

LIMSI's Statistical Translation Systems for WMT'09.
Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009

Plusieurs langues (bien choisies) valent mieux qu'une : traduction statistique multi-source par renforcement lexical.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2009

Gappy Translation Units under Left-to-Right SMT Decoding.
Proceedings of the 13th Annual conference of the European Association for Machine Translation, 2009

Improvements in Analogical Learning: Application to Translating Multi-Terms of the Medical Domain.
Proceedings of the EACL 2009, 12th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference, Athens, Greece, March 30, 2009

Text segmentation via topic modeling: an analytical study.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

Limsi's Statistical Translation Systems for WMT'08.
Proceedings of the Third Workshop on Statistical Machine Translation, 2008

Appariement d'entités nommées coréférentes : combinaisons de mesures de similarité par apprentissage supervisé.
Proceedings of the Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2008

Transcrire les SMS comme on reconnaît la parole.
Proceedings of the Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2008

Analogical Translation of Medical Words in Different Languages.
Proceedings of the Advances in Natural Language Processing, 2008

The asymptotics of semi-supervised learning in discriminative probabilistic models.
Proceedings of the Machine Learning, 2008

Detecting Fake Content with Relative Entropy Scoring.
Proceedings of the ECAI'08 Workshop on Uncovering Plagiarism, 2008

Using LDA to detect semantically incoherent documents.
Proceedings of the Twelfth Conference on Computational Natural Language Learning, 2008

Robust Similarity Measures for Named Entities Matching.
Proceedings of the COLING 2008, 2008

Scaling up Analogical Learning.
Proceedings of the COLING 2008, 2008

Normalizing SMS: are Two Metaphors Better than One ?
Proceedings of the COLING 2008, 2008

Inference and evaluation of the multinomial mixture model for text clustering.
Inf. Process. Manag., 2007

Adaptive database reduction for domain specific speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Engineering multimedia applications on the basis of multi-structured descriptions of audiovisual contents.
Proceedings of the International Workshop On Semantically Aware Document Processing And Indexing, 2007

Optimization on decoding graphs by discriminative training.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Approaches for adaptive database reduction for text-to-speech synthesis.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Du quatrième de proportion comme principe inductif : une proposition et son application à l'apprentissage de la morphologie.
Trait. Autom. des Langues, 2006

Productivité quantitative des suffixations par -ité et -Able dans un corpus journalistique moderne.
Proceedings of the Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2006

Corpus design based on the kullback-leibler divergence for text-to-speech synthesis application.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Discriminative training of finite state decoding graphs.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On the use of morphological constraints in n-gram statistical language model.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

An Analogical Learner for Morphological Analysis.
Proceedings of the Ninth Conference on Computational Natural Language Learning, 2005

Apprentissage par analogie et rapports de proportion : contributions méthodologiques et expérimentales.
Proceedings of the Actes de CAP 05, Conférence francophone sur l'apprentissage automatique, 2005

Arc minimization in finite-state decoding graphs with cross-word acoustic context.
Comput. Speech Lang., 2004

Analogies dans les séquences : un solveur à états finis.
Proceedings of the Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Posters, 2004

Automating Indexing of Classes and Conferences.
Proceedings of the Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications), 2004

Apprentissage Automatique de Paraphrases pour l'Amélioration d'un Système de Questions-Réponses.
Proceedings of the Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2003

Proper Names Extraction from Fax Images Combining Textual and Image Features.
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR 2003), 2003

Improving Rocchio with Weakly Supervised Clustering.
Proceedings of the Machine Learning: ECML 2003, 2003

Using the Web as a Linguistic Resource for Learning Reformulations Automatically.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Integrating contextual phonological rules in a large vocabulary decoder.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A French Phonetic Lexicon with Variants for Speech and Language Processing.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

The hidden dimension: a paradigmatic view of data-driven NLP.
J. Exp. Theor. Artif. Intell., 1999

Pronouncing unknown words using multi-dimensional analogies.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French.
Comput. Speech Lang., 1998

Evaluation of grapheme-to phoneme conversion for text-to-speech synthesis in French.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Paradigmatic Cascades: a Linguistically Sound Model of Pronunciation by Analogy.
Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, 1997

Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks
CoRR, 1996

Introducing statistical dependencies and structural constraints in variable-length sequence models.
Proceedings of the Grammatical Inference: Learning Syntax from Sentences, 1996

Variable-length sequence matching for phonetic transcription using joint multigrams.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

A dynamic approach to paradigm-driven analogy.
Proceedings of the Connectionist, 1995
