Yves Scherrer

Orcid: 0000-0001-5247-5073

According to our database1, Yves Scherrer authored at least 65 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


On csauthors.net:


Democratizing neural machine translation with OPUS-MT.
Lang. Resour. Evaluation, June, 2024

Explainability of Machine Learning Approaches in Forensic Linguistics: A Case Study in Geolinguistic Authorship Profiling.
CoRR, 2024

Hybrid Distillation from RBMT and NMT: Helsinki-NLP's Submission to the Shared Task on Translation into Low-Resource Languages of Spain.
Proceedings of the Ninth Conference on Machine Translation, 2024

Definition generation for lexical semantic change detection.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

The Helsinki-NLP Submissions at NADI 2023 Shared Task: Walking the Baseline.
Proceedings of ArabicNLP 2023, Singapore (Hybrid), December 7, 2023, 2023

Dialect Representation Learning with Neural Dialect-to-Standard Normalization.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Findings of the VarDial Evaluation Campaign 2023.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Character alignment methods for dialect-to-standard normalization.
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023

Dialect-to-Standard Normalization: A Large-Scale Multilingual Evaluation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

CorCoDial - Machine translation techniques for corpus-based computational dialectology.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

Democratizing Machine Translation with OPUS-MT.
CoRR, 2022

Low Saxon dialect distances at the orthographic and syntactic level.
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change, 2022

Social Media Variety Geolocation with geoBERT.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Findings of the VarDial Evaluation Campaign 2021.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Boosting Neural Machine Translation from Finnish to Northern Sámi with Rule-Based Backtranslation.
Proceedings of the 23rd Nordic Conference on Computational Linguistics, 2021

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization.
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

Natural language processing for similar languages, varieties, and dialects: A survey.
Nat. Lang. Eng., 2020

The MUCOW word sense disambiguation test suite at WMT 2020.
Proceedings of the Fifth Conference on Machine Translation, 2020

The University of Helsinki and Aalto University submissions to the WMT 2020 news and low-resource translation tasks.
Proceedings of the Fifth Conference on Machine Translation, 2020

LSDC - A comprehensive dataset for Low Saxon Dialect Classification.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

A Report on the VarDial Evaluation Campaign 2020.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

Paraphrase Generation and Evaluation on Colloquial-Style Sentences.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Neural morphosyntactic tagging for Rusyn.
Nat. Lang. Eng., 2019

Digitising Swiss German: how to process and study a polycentric spoken language.
Lang. Resour. Evaluation, 2019

The University of Helsinki Submissions to the WMT19 News Translation Task.
Proceedings of the Fourth Conference on Machine Translation, 2019

The University of Helsinki Submissions to the WMT19 Similar Language Translation Task.
Proceedings of the Fourth Conference on Machine Translation, 2019

The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation.
Proceedings of the Fourth Conference on Machine Translation, 2019

Analysing concatenation approaches to document-level NMT in two different domains.
Proceedings of the Fourth Workshop on Discourse in Machine Translation, 2019

Translational Grounding: Using Paraphrase Recognition and Generation to Demonstrate Semantic Abstraction Abilities of MultiLingual NMT.
CoRR, 2018

The University of Helsinki submissions to the WMT18 news task.
Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018

The WMT'18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English.
Proceedings of the Third Conference on Machine Translation: Shared Task Papers, 2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

The University of Helsinki submissions to the IWSLT 2018 low-resource translation task.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

The Helsinki Neural Machine Translation System.
Proceedings of the Second Conference on Machine Translation, 2017

Findings of the VarDial Evaluation Campaign 2017.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Multi-source morphosyntactic tagging for spoken Rusyn.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Neural Machine Translation with Extended Context.
Proceedings of the Third Workshop on Discourse in Machine Translation, 2017

Lexicon Induction for Spoken Rusyn - Challenges and Results.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

Modernising historical Slovene words.
Nat. Lang. Eng., 2016

Cartopho : un site web de cartographie de variantes de prononciation en français (Cartopho: a website for mapping pronunciation variants in French).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 1 : JEP, 2016

ArchiMob - A Corpus of Spoken Swiss German.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation.
Proceedings of the 13th Conference on Natural Language Processing, 2016

On-line Multilingual Linguistic Services.
Proceedings of the COLING 2016, 2016

Crowdsourced mapping of pronunciation variants in European French.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Unsupervised adaptation of supervised part-of-speech taggers for closely related languages.
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014

A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

SwissAdmin: A multilingual tagged parallel corpus of press releases.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Modernizing historical Slovene words with character-based SMT.
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013

The Trilingual ALLEGRA Corpus: Presentation and Possible Use for Lexicon Induction.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

La traduction automatique des pronoms. Problèmes et perspectives (Automatic translation of pronouns. Problems and perspectives).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011

Étude inter-langues de la distribution et des ambiguïtés syntaxiques des pronoms (A study of cross-language distribution and syntactic ambiguities of pronouns).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011

Morphology Generation for Swiss German Dialects.
Proceedings of the Systems and Frameworks for Computational Morphology, 2011

Des cartes dialectologiques numérisées pour le TALN.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations, 2010

Natural Language Processing for the Swiss German Dialect Area.
Proceedings of the Semantic Approaches in Natural Language Processing: Proceedings of the 10th Conference on Natural Language Processing, 2010

Word-Based Dialect Identification with Georeferenced Rules.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

Deep Linguistic Multilingual Translation and Bilingual Dictionaries.
Proceedings of the Fourth Workshop on Statistical Machine Translation, 2009

Un système de traduction automatique paramétré par des atlas dialectologiques.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2009

On-line and off-line translation aids for non-native readers.
Proceedings of the International Multiconference on Computer Science and Information Technology, 2009

Transducteurs à fenêtre glissante pour l'induction lexicale.
Proceedings of the Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues, 2008

Adaptive String Distance Measures for Bilingual Dialect Lexicon Induction.
Proceedings of the ACL 2007, 2007
