Simon Clematide

Orcid: 0000-0003-1365-0662

According to our database1, Simon Clematide authored at least 85 papers between 2001 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



PARAPHRASUS : A Comprehensive Benchmark for Evaluating Paraphrase Detection Models.
CoRR, 2024

Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models.
CoRR, 2024

New "ArchAIval" Practices: Using GPT for OCR and Historical Narration of Index Cards.
Proceedings of the Linking Theory and Practice of Digital Libraries, 2024

Mapping Work Task Descriptions from German Job Ads on the O*NET Work Activities Ontology.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

UZH_CLyp at SemEval-2023 Task 9: Head-First Fine-Tuning and ChatGPT Data Generation for Cross-Lingual Learning in Tweet Intimacy Prediction.
Proceedings of the The 17th International Workshop on Semantic Evaluation, 2023

HIPE-2022 Shared Task Named Entity Datasets.
Dataset, May, 2022

HIPE-2022 Shared Task Named Entity Datasets.
Dataset, March, 2022

HIPE-2022 Shared Task Named Entity Datasets.
Dataset, March, 2022

HIPE-2022 Shared Task Named Entity Datasets.
Dataset, February, 2022

Transformer-based HTR for Historical Documents.
CoRR, 2022

Evaluation of HTR models without Ground Truth Material.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents.
Proceedings of the Advances in Information Retrieval, 2022

Grenzüberschreitendes Textmining von Historischen Zeitungen - Das impresso-Projekt zwischen Text- und Bildverarbeitung, Design und Geschichtswissenschaft.
Proceedings of the 8. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, 2022

Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2022

Extended Overview of HIPE-2022: Named Entity Recognition and Linking in Multilingual Historical Documents.
Proceedings of the Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum, Bologna, Italy, September 5th - to, 2022

On Isotropy Calibration of Transformer Models.
Proceedings of the Third Workshop on Insights from Negative Results in NLP, 2022

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers.
J. Data Min. Digit. Humanit., 2021

On Isotropy Calibration of Transformers.
CoRR, 2021

Searching for Legal Documents at Paragraph Level: Automating Label Generation and Use of an Extended Attention Mask for Boosting Neural Models of Semantic Similarity.
Proceedings of the Natural Legal Language Processing Workshop 2021, 2021

Ranking Georeferences for Efficient Crowdsourcing of Toponym Annotations in a Historical Corpus of Alpine Texts.
Proceedings of the 5th Swiss Text Analytics Conference and the 16th Conference on Natural Language Processing, 2020

CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion.
Proceedings of the 17th SIGMORPHON Workshop on Computational Research in Phonetics, 2020

How Much Data Do You Need? About the Creation of a Ground Truth for Black Letter and the Effectiveness of Neural OCR.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Language Resources for Historical Newspapers: the Impresso Collection.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Introducing the CLEF 2020 HIPE Shared Task: Named Entity Recognition and Linking on Historical Newspapers.
Proceedings of the Advances in Information Retrieval, 2020

Historical Newspaper Content Mining: Revisiting the impresso Project's Challenges in Text and Image Processing, Design and Historical Scholarship.
Proceedings of the 15th Annual International Conference of the Alliance of Digital Humanities Organizations, 2020

Extended Overview of CLEF HIPE 2020: Named Entity Processing on Historical Newspapers.
Proceedings of the Working Notes of CLEF 2020, 2020

Overview of CLEF HIPE 2020: Named Entity Recognition and Linking on Historical Newspapers.
Proceedings of the Experimental IR Meets Multilinguality, Multimodality, and Interaction, 2020

Semi-supervised Contextual Historical Text Normalization.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus.
J. Lang. Technol. Comput. Linguistics, 2018

Supervised OCR Error Detection and Correction Using Statistical and Neural Machine Translation Methods.
J. Lang. Technol. Comput. Linguistics, 2018

Parsing Approaches for Swiss German.
Proceedings of the 3rd Swiss Text Analytics Conference, SwissText 2018, Winterthur, 2018

Strategies and Challenges for Crowdsourcing Regional Dialect Perception Data for Swiss German and Swiss French.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

A Simple and Effective biLSTM Approach to Aspect-Based Sentiment Analysis in Social Media Customer Feedback.
Proceedings of the 14th Conference on Natural Language Processing, 2018

Imitation Learning for Neural Morphological String Transduction.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

UZH at CoNLL-SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection.
Proceedings of the CoNLL SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection, Brussels, October 31, 2018

Neural Transition-based String Transduction for Limited-Resource Setting in Morphology.
Proceedings of the 27th International Conference on Computational Linguistics, 2018

CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

UZH at TAC KBP 2017: Event Nugget Detection via Joint Learning with Softmax-Margin Objective.
Proceedings of the 2017 Text Analysis Conference, 2017

Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanities.
Proceedings of the Workshop on Teaching NLP for Digital Humanities (Teach4DH) 2017, 2017

Stance Detection in Facebook Posts of a German Right-wing Party.
Proceedings of the 2nd Workshop on Linking Models of Lexical, 2017

Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection.
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, 2017

Verb-Mediated Composition of Attitude Relations Comprising Reader and Writer Perspective.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2017

BioCreative V track 4: a shared task for the extraction of causal network information using the Biological Expression Language.
Database J. Biol. Databases Curation, 2016

How Factuality Determines Sentiment Inferences.
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics, 2016

Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations.
Proceedings of the 13th Conference on Natural Language Processing, 2016

Bi-particle Adverbs, PoS-Tagging and the Recognition of German Separable Prefix Verbs.
Proceedings of the 13th Conference on Natural Language Processing, 2016

A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC.
J. Am. Medical Informatics Assoc., 2015

Large-scale Information Extraction for Assisted Curation of the Biomedical Literature.
Proceedings of 1st AI*IA Workshop on Intelligent Techniques At LIbraries and Archives co-located with XIV Conference of the Italian Association for Artificial Intelligence, 2015

OntoGene web services for biomedical text mining.
BMC Bioinform., 2014

Assisted curation of regulatory interactions and growth conditions of OxyR in <i>E. coli</i> K-12.
Database J. Biol. Databases Curation, 2014

Collaboratively Annotating Multilingual Parallel Corpora in the Biomedical Domain―some MANTRAs.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Tagging Complex Non-Verbal German Chunks with Conditional Random Fields.
Proceedings of the 12th Edition of the Konvens Conference, 2014

Detecting Code-Switching in a Multilingual Alpine Heritage Corpus.
Proceedings of the First Workshop on Computational Approaches to Code Switching@EMNLP 2014, 2014

Wozu Kasusrektion auszeichnen bei Präpositionen?
J. Lang. Technol. Comput. Linguistics, 2013

Using the OntoGene pipeline for the triage task of BioCreative 2012.
Database J. Biol. Databases Curation, 2013

A Case Study in Tagging Case in German: An Assessment of Statistical Approaches.
Proceedings of the Systems and Frameworks for Computational Morphology, 2013

A Pilot Study on the Semantic Classification of Two German Prepositions: Combining Monolingual and Multilingual Evidence.
Proceedings of the Recent Advances in Natural Language Processing, 2013

Multilingual Semantic Resources and Parallel Corpora in the Biomedical Domain: the CLEF-ER Challenge.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Entity Recognition in Parallel Multi-lingual Biomedical Corpora: The CLEF-ER Laboratory Overview.
Proceedings of the Information Access Evaluation. Multilinguality, Multimodality, and Visualization, 2013

Deriving an English Biomedical Silver Standard Corpus for CLEF-ER.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Creating Multilingual Gold Standard Corpora for Biomedical Concept Recognition.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

Exploiting BabelNet for Multilingual Biomedical Synonym Expansion.
Proceedings of the Working Notes for CLEF 2013 Conference , 2013

UZH in BioNLP 2013.
Proceedings of the BioNLP Shared Task 2013 Workshop, Sofia, 2013

Relation mining experiments in the pharmacogenomics domain.
J. Biomed. Informatics, 2012

Ranking relations between diseases, drugs and genes for a curation task.
J. Biomed. Semant., 2012

Using ODIN for a PharmGKB revalidation experiment.
Database J. Biol. Databases Curation, 2012

Dependency parsing for interaction detection in pharmacogenomics.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Notes about the OntoGene Pipeline.
Proceedings of the Information Retrieval and Knowledge Discovery in Biomedical Text, 2012

Detection of interaction articles and experimental methods in biomedical literature.
BMC Bioinform., 2011

BioCreative III interactive task: an overview.
BMC Bioinform., 2011

Assessment of NER solutions against the first and second CALBC Silver Standard Corpus.
J. Biomed. Semant., 2011

Semi-automatic test generation for tandem learning.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2011

Ranking Interactions for a Curation Task.
Proceedings of the 10th International Conference on Machine Learning and Applications and Workshops, 2011

An Incremental Model for the Coreference Resolution Task of BioNLP 2011.
Proceedings of BioNLP Shared Task 2011 Workshop, Portland, Oregon, USA, June 24, 2011, 2011

OntoGene in BioCreative II.5.
IEEE ACM Trans. Comput. Biol. Bioinform., 2010

An OLIF-based open inflectional resource and yet another morphological system for German.
Proceedings of the Text Resources and Lexical Knowledge. Selected Papers from the 9th Conference on Natural Language Processing, 2008

CLab - eine web-basierte interaktive Lernplattform für Studierende der Computerlinguistik.
Proceedings of the DeLFI 2007, 2007

GermaNet und UniNet.
LDV Forum, 2004

LUIS - Ein natärlichsprachliches, universitäres Informationssystem.
Proceedings of the Unternehmen Hochschule, 2001

Learn - Filter - Apply - Forget. Mixed Approaches to Named Entity Recognition.
Proceedings of the Applications of Natural Language to Information Systems, 2001

Linguistische und semantische Annotation eines Zeitungskorpos.
Proceedings of the Proceedings der GLDV-Frühjahrstagung 2001, 2001
