Serge Sharoff

Orcid: 0000-0002-4877-0210

According to our database1, Serge Sharoff authored at least 88 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Making Democratic Deliberation and Participation more Accessible: The iDEM Project.
Proceedings of the Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations (SEPLN-CEDI-PD 2024) co-located with the 7th Spanish Conference on Informatics (CEDI 2024), 2024

Quantifying the Contribution of MWEs and Polysemy in Translation Errors for English-Igbo MT.
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), 2024

Enhancing Image-to-Text Generation in Radiology Reports through Cross-modal Multi-Task Learning.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Fine-tuning language models to recognize semantic relations.
Lang. Resour. Evaluation, December, 2023

Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation.
CoRR, 2023

Syntactic Knowledge via Graph Attention with BERT in Machine Translation.
CoRR, 2023

GATology for Linguistics: What Syntactic Dependencies It Knows.
CoRR, 2023

FTD at SemEval-2023 Task 3: News Genre and Propaganda Detection by Comparing Mono- and Multilingual Models with Fine-tuning on Additional Data.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task.
CoRR, 2022

Towards Arabic Sentence Simplification via Classification and Generative Approaches.
CoRR, 2022

Towards Arabic Sentence Simplification via Classification and Generative Approaches.
Proceedings of the The Seventh Arabic Natural Language Processing Workshop, 2022

Multimodal Pipeline for Collection of Misinformation Data from Telegram.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Estimating Confidence of Predictions of Individual Classifiers and TheirEnsembles for the Genre Classification Task.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Applying Natural Annotation and Curriculum Learning to Named Entity Recognition for Under-Resourced Languages.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Experiments with adversarial attacks on text genres.
CoRR, 2021

Automatic Difficulty Classification of Arabic Sentences.
Proceedings of the Sixth Arabic Natural Language Processing Workshop, 2021

Finding next of kin: Cross-lingual embedding spaces for related languages.
Nat. Lang. Eng., 2020

Sentence Level Human Translation Quality Estimation with Attention-based Neural Networks.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Know thy Corpus! Robust Methods for Digital Curation of Web corpora.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Recognizing Semantic Relations by Combining Transformers and Fully Connected Models.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Recognizing Semantic Relations: Attention-Based Transformers vs. Recurrent Models.
Proceedings of the Advances in Information Retrieval, 2020

Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora.
Proceedings of the 13th Workshop on Building and Using Comparable Corpora, 2020

New Areas of Application of Comparable Corpora.
Proceedings of the Using Comparable Corpora for Under-Resourced Areas of Machine Translation, 2019

Towards Functionally Similar Corpus Resources for Translation.
Proceedings of the International Conference on Recent Advances in Natural Language Processing, 2019

Overview of the Third BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora.
Proceedings of the 11th Workshop on Building and Using Comparable Corpora, 2018

A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Investigating the Influence of Bilingual MWU on Trainee Translation Quality.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Cross-lingual Terminology Extraction for Translation Quality Estimation.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Language adaptation experiments via cross-lingual embeddings for related languages.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Overview of the Second BUCC Shared Task: Spotting Parallel Sentences in Comparable Corpora.
Proceedings of the 10th Workshop on Building and Using Comparable Corpora, 2017

Toward Pan-Slavic NLP: Some Experiments with Language Adaptation.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

Language Adaptation for Extending Post-Editing Estimates for Closely Related Languages.
Prague Bull. Math. Linguistics, 2016

Recent advances in machine translation using comparable corpora.
Nat. Lang. Eng., 2016

Nat. Lang. Eng., 2016

Crowdsourcing for web genre annotation.
Lang. Resour. Evaluation, 2016

MoBiL: A Hybrid Feature Set for Automatic Human Translation Quality Assessment.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Adam Kilgarriff's Legacy to Computational Linguistics and Beyond.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2016

Genre classification for a corpus of academic webpages.
Proceedings of the 10th Web as Corpus Workshop, 2016

Web Corpus Construction Roland Schäfer and Felix Bildhauer (Freie Universität Berlin) Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 22), 2013, 145 pages, paper-bound, ISBN 9781608459834, doi: 10.2200/S00508ED1V01Y201305HLT022.
Comput. Linguistics, 2015

Large Scale Translation Quality Estimation.
Proceedings of the 1st Deep Machine Translation Workshop, 2015

BUCC Shared Task: Cross-Language Document Similarity.
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora, 2015

Obtaining SMT dictionaries for related languages.
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora, 2015

Applying Multi-Dimensional Analysis to a Russian Webcorpus: Searching for Evidence of Genres.
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015

Introduction to the special issue on Resources and Tools for Language Learners.
Lang. Resour. Evaluation, 2014

Corpus-based vocabulary lists for language learners for nine languages.
Lang. Resour. Evaluation, 2014

Document dissimilarity within and across languages: A benchmarking study.
Lit. Linguistic Comput., 2014

Semi-supervised Graph-based Genre Classification for Web Pages.
Proceedings of TextGraphs@EMNLP 2014: the 9th Workshop on Graph-based Methods for Natural Language Processing, 2014

Designing and Evaluating a Reliable Corpus of Web Genres via Crowd-Sourcing.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Extracting Multiword Translations from Aligned Comparable Documents.
Proceedings of the 3rd Workshop on Hybrid Approaches to Machine Translation, 2014

Multiple views as aid to linguistic annotation error analysis.
Proceedings of the 8th Linguistic Annotation Workshop, 2014

SentiML: functional annotation for multilingual sentiment analysis.
Proceedings of the 1st International Workshop on Collaborative Annotations in Shared Environment, 2013

English-to-Russian MT evaluation campaign.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora.
Proceedings of the Building and Using Comparable Corpora., 2013

Measuring the Distance Between Comparable Corpora Between Languages.
Proceedings of the Building and Using Comparable Corpora., 2013

Identifying Word Translations from Comparable Documents Without a Seed Lexicon.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Design of a hybrid high quality machine translation system.
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation HyTra@EACL 2012, 2012

In the Garden and in the Jungle.
Proceedings of the Genres on the Web, 2011

Any Land in Sight?
Proceedings of the Genres on the Web, 2011

Riding the Rough Waves of Genre on the Web.
Proceedings of the Genres on the Web, 2011

Multiword expressions: hard going or plain sailing?
Lang. Resour. Evaluation, 2010

Using an integrated feature set to generalize and justify the Chinese-to-English transferring rule of the 'ZHE' aspect.
J. Zhejiang Univ. Sci. C, 2010

Advanced Corpus Solutions for Humanities Researchers.
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 2010

The Web Library of Babel: evaluating genre collections.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Fine-Grained Genre Classification Using Structural Learning Algorithms.
Proceedings of the ACL 2010, 2010

'Irrefragable answers' using comparable corpora to retrieve translation equivalents.
Lang. Resour. Evaluation, 2009

Web Genre Benchmark Under Construction.
J. Lang. Technol. Comput. Linguistics, 2009

Evaluation-Guided Pre-Editing of Source Text: Improving MT-Tractability of Light Verb Constructions.
Proceedings of the 13th Annual conference of the European Association for Machine Translation, 2009

Designing and Evaluating a Russian Tagset.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Corpus-Based Tools for Computer-Assisted Acquisition of Reading Abilities in Cognate Languages.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Cleaneval: a Competition for Cleaning Web Pages.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Generalising Lexical Translation Strategies for MT Using Comparable Corpora.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

Linguistic support for concept selection decisions.
Artif. Intell. Eng. Des. Anal. Manuf., 2007

Translating from under-resourced languages: comparing direct transfer against pivot translation.
Proceedings of Machine Translation Summit XI: Papers, 2007

Assisting Translators in Indirect Lexical Transfer.
Proceedings of the ACL 2007, 2007

Using collocations from comparable corpora to find translation equivalents.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

A Uniform Interface to Large-Scale Linguistic Resources.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Using Richly Annotated Trilingual Language Resources for Acquiring Reading Skills in a Foreign Language.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

ASSIST: Automated Semantic Assistance for Translators.
Proceedings of the EACL 2006, 2006

Using Comparable Corpora to Solve Problems Difficult for Human Translators.
Proceedings of the ACL 2006, 2006

Towards Basic Categories for Describing Properties of Texts in a Corpus.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Meaning as use: exploitation of aligned corpora for the contrastive study of lexical semantics.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Concordancing for parallel spoken language corpora.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Resources for Multilingual Text Generation in Three Slavic Languages.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Multilinguality in a Text Generation System For Three Slavic Languages.
Proceedings of the COLING 2000, 18th International Conference on Computational Linguistics, Proceedings of the Conference, 2 Volumes, July 31, 2000

Register-domain Separation as a Methodology for Development of Natural Language Interfaces to Databases.
Proceedings of the Human-Computer Interaction INTERACT '99: IFIP TC13 International Conference on Human-Computer Interaction, 1999

Understanding Short Texts with Integration of Knowledge Representation Methods.
Proceedings of the Perspectives of System Informatics, 1996
