Laurent Romary

Orcid: 0000-0002-0756-0508

  • INRIA, Paris, France
  • Humboldt University of Berlin, Centre Marc Bloch, Germany
  • INRIA Team ALMAnaCH, Berlin, Germany
  • INRIA Lorraine, Vandoeuvre-les-Nancy, France

According to our database1, Laurent Romary authored at least 140 papers between 1989 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Making Software FAIR: A machine-assisted workflow for the research software lifecycle.
CoRR, January, 2025

How to build an Open Science Monitor based on publications? A French perspective.
CoRR, January, 2025

The Morais Dictionary: Following Best Practices in a Retro-digitized Dictionary Project.
Int. J. Humanit. Arts Comput., 2024

Diachronic Document Dataset for Semantic Layout Analysis.
CoRR, 2024

Harvesting Textual and Structured Data from the HAL Publication Repository.
CoRR, 2024

Evaluating the Effectiveness of Large Language Models in Establishing Conversational Grounding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Translate your Own: a Post-Editing Experiment in the NLP domain.
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1), 2024

Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

On Modelling Corpus Citations in Computational Lexical Resources.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

A crossroad between lexicography and terminology work: Knowledge organization and domain labelling.
Digit. Scholarsh. Humanit., June, 2023

CamemBERT-bio: a Tasty French Language Model Better for your Health.
CoRR, 2023

CamemBERT-bio : Un modèle de langue français savoureux et meilleur pour la santé.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023

MaTOS: Traduction automatique pour la science ouverte.
Proceedings of the Actes de CORIA-TALN 2023. Actes de l'atelier "Analyse et Recherche de Textes Scientifiques", 2023

ISO LMF 24613-6: A Revised Syntax Semantics Module for the Lexical Markup Framework.
Proceedings of the 4th Conference on Language, Data and Knowledge, 2023

Development of a Normalized Hadith Narrator Encyclopedia with TEI.
Computación y Sistemas, 2022

Integrating Terminological and Ontological Principles into a Lexicographic Resource (poster).
Proceedings of the 1st International Conference on Multilingual Digital Terminology Today, 2022

BERTrade: Using Contextual Embeddings to Parse Old French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Towards a Cleaner Document-Oriented Multilingual Crawled Corpus.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Modelling Usage Information in a Legacy Dictionary: From TEI Lex-0 to Ontolex-Lemon.
Proceedings of the Workshop on Computational Methods in the Humanities 2022, 2022

From Disparate Disciplines to Unity in Diversity: How the PARTHENOS Project Has Brought European Humanities Research Infrastructures Together.
Int. J. Humanit. Arts Comput., 2021

Arabic factoid Question-Answering system for Islamic sciences using normalized corpora.
Proceedings of the Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES-2021, 2021

Curated Archiving of Research Software Artifacts: Lessons Learned from the French Open Archive (HAL).
Int. J. Digit. Curation, 2020

Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020

Establishing a New State-of-the-Art for French Named Entity Recognition.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Modelling Etymology in LMF/TEI: The Grande Dicionário Houaiss da Língua Portuguesa Dictionary as a Use Case.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

DeLFT and Entity-fishing: Tools for CLEF HIPE 2020 Shared Task.
Proceedings of the Working Notes of CLEF 2020, 2020

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

CamemBERT: a Tasty French Language Model.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Leveraging Concepts in Open Access Publications.
J. Data Min. Digit. Humanit., 2019

LMF Reloaded.
CoRR, 2019

Automatic Identification and Normalisation of Physical Measurements in Scientific Literature.
Proceedings of the ACM Symposium on Document Engineering 2019, 2019

SSK by example. Make your Arts and Humanities research go standard.
Proceedings of the 13th Annual International Conference of the Alliance of Digital Humanities Organizations, 2018

Segmentation Tool for Hadith Corpus to Generate TEI Encoding.
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, 2018

The DARIAH ERIC: Redefining Research Infrastructure for the Arts and Humanities in the Digital Age.
ERCIM News, 2017

Encoding Prototype of Al-Hadith Al-Shareef in TEI.
Proceedings of the Arabic Language Processing: From Theory to Practice, 2017

Access To Cultural Heritage Data: A Challenge For The Digital Humanities.
Proceedings of the 12th Annual International Conference of the Alliance of Digital Humanities Organizations, 2017

Nachhaltigkeit durch Zusammenschluss: Die DARIAH Data Re-Use Charter.
Proceedings of the 4. Tagung des Verbands Digital Humanities im deutschsprachigen Raum, 2017

Open Science: Taking Our Destiny into Our Own Hands.
ERCIM News, October, 2016

Data fluidity in DARIAH - pushing the agenda forward.
CoRR, 2016

Deep encoding of etymological information in TEI.
CoRR, 2016

Crowds for Clouds: Recent Trends in Humanities Research Infrastructures.
CoRR, 2016

TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Algebraic Specification for Interoperability Between Data Formats: Application on Arabic Lexical Data.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2016

A New Method for Interoperability Between Lexical Resources Using MDA Approach.
Proceedings of the International Conference on Advanced Intelligent Systems and Informatics, 2016

Projecting LMF Lexica Towards OWL-DL through LMF-JAPE Patterns to Obtain Interoperable Formats.
Res. Comput. Sci., 2015

<tiger2/>: serialising the ISO SynAF syntactic object model.
Lang. Resour. Evaluation, 2015

TEI and LMF crosswalks.
J. Lang. Technol. Comput. Linguistics, 2015

GROBID - Information Extraction from Scientific Publications.
ERCIM News, 2015

Standards for language resources in ISO - Looking back at 13 fruitful years.
CoRR, 2015

Automatic Construction of a TMF Terminological Database using a Transducer Cascade.
Proceedings of the Recent Advances in Natural Language Processing, 2015

Recent Initiatives towards New Standards for Language Resources.
Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology, 2015

EPISCIENCES - An overlay publication platform.
Inf. Serv. Use, 2014

Méthodes pour la représentation informatisée de données lexicales / Methoden der Speicherung lexikalischer Daten.
CoRR, 2014

TBX goes TEI - Implementing a TBX basic extension for the Text Encoding Initiative guidelines.
CoRR, 2014

Natural Language Processing for Historical Texts Michael Piotrowski (Leibniz Institute of European History) Morgan & Claypool (Synthesis Lectures on Human Language Technologies, edited by Graeme Hirst, volume 17), 2012, ix+157 pp; paperbound, ISBN 978-1608459469.
Comput. Linguistics, 2014

Multilinguality in historical documents - challenges and solutions for digital humanities.
Proceedings of the 9th Annual International Conference of the Alliance of Digital Humanities Organizations, 2014

LAUDATIO-Repository: Accessing a heterogeneous field of linguistic corpora with the help of an open access repository.
Proceedings of the 9th Annual International Conference of the Alliance of Digital Humanities Organizations, 2014

Beyond Infrastructure: Modelling Scholarly Research and Collaboration.
Proceedings of the 8th Annual International Conference of the Alliance of Digital Humanities Organizations, 2013

A prototype for projecting HPSG syntactic lexica towards LMF.
J. Lang. Technol. Comput. Linguistics, 2012

Ce qui compte. Méthodes statistiques. Ecrits choisis, tome II. Etienne Brunet (edited by Céline Poudat).
Lit. Linguistic Comput., 2012

Data Management in the Humanities.
ERCIM News, 2012

Standard for morphosyntactic and syntactic corpus annotation: The Morphosyntactic and the Syntactic Annotation Framework, MAF and SynAF.
Proceedings of the 11th Conference on Natural Language Processing, 2012

Collaborative Machine Translation Service for Scientific texts.
Proceedings of the EACL 2012, 2012

Future Developments for TEI ODD.
Proceedings of the 7th Annual International Conference of the Alliance of Digital Humanities Organizations, 2012

Textual Summarisation of Flowcharts in Patent Drawings for CLEF-IP 2012.
Proceedings of the CLEF 2012 Evaluation Labs and Workshop, 2012

The 'application/tei+xml' Media Type.
RFC, February, 2011

Data formats for phonological corpora
CoRR, 2011

Scholarly Communication
CoRR, 2011

Constructing DARIAH - the e-Infrastructure for the Arts and Humanities.
Proceedings of the 6th Annual International Conference of the Alliance of Digital Humanities Organizations, 2011

Beyond Institutional Repositories.
Int. J. Digit. Libr. Syst., 2010

Comparing Repository Types: Challenges and Barriers for Subject-Based Repositories, Research Repositories, National Repository Systems and Institutional Repositories in Serving Scholarly Communication.
Int. J. Digit. Libr. Syst., 2010

Stabilizing knowledge through standards - A perspective for the humanities
CoRR, 2010

HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID.
Proceedings of the 5th International Workshop on Semantic Evaluation, 2010

ISO-TimeML: An International Standard for Semantic Annotation.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

GRISP: A Massive Multilingual Terminological Database for Scientific and Technical Domains.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

MLIF : A Metamodel to Represent and Exchange Multilingual Textual Information.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Towards an ISO Standard for Dialogue Act Annotation.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

Experiments with Citation Mining and Key-Term Extraction for Prior Art Search.
Proceedings of the CLEF 2010 LABs and Workshops, 2010

Representing human and machine dictionaries in Markup languages
CoRR, 2009

Standardization of the formal representation of lexical information for NLP
CoRR, 2009

Communication scientifique : Pour le meilleur et pour le PEER
CoRR, 2009

Towards Multimodal Content Representation
CoRR, 2009

Dynamically Generated Interfaces in XML Based Architecture
CoRR, 2009

A Common XML-based Framework for Syntactic Annotations
CoRR, 2009

Reference Resolution within the Framework of Cognitive Grammar
CoRR, 2009

A general XML-based distributed software architecture for accessing and sharing ressources
CoRR, 2009

Pattern Based Term Extraction Using ACABIT System
CoRR, 2009

Encoding models for scholarly literature
CoRR, 2009

Multiple Retrieval Models and Regression Models for Prior Art Search.
Proceedings of the Working Notes for CLEF 2009 Workshop co-located with the 13th European Conference on Digital Libraries (ECDL 2009) , Corfù, Greece, September 30, 2009

PATATRAS: Retrieval Model Combination and Regression Models for Prior Art Search.
Proceedings of the Multilingual Information Access Evaluation I. Text Retrieval Experiments, 2009

Questions & Answers for TEI Newcomers
CoRR, 2008

Foundation of a Component-based Flexible Registry for Language Resources and Technology.
Proceedings of the International Conference on Language Resources and Evaluation, 2008

OA@MPS - a colourful view.
ZfBB, 2007

A Formal Model of Dictionary Structure and Content
CoRR, 2007

A lexicon for Vietnamese language processing.
Lang. Resour. Evaluation, 2006

Un modèle générique d'organisation de corpus en ligne: application à la FReeBank
CoRR, 2006

Unification of multi-lingual scientific terminological resources using the ISO 16642 standard. The TermSciences initiative
CoRR, 2006

Foundations of Modern Language Resource Archives.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Metadata Profile in the ISO Data Category Registry.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

A Lexicalized Tree-Adjoining Grammar for Vietnamese.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

An API for accessing the Data Category Registry.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Representing Linguistic Corpora and Their Annotations.
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006

Implementing Multilingual Information Framework in Applications Using Textual Display.
Proceedings of the ICEIS 2005, 2005

Multilingual information framework for handling textual data in digital media.
Proceedings of the 2005 International Conference on Active Media Technology, 2005

International standard for a linguistic annotation framework.
Nat. Lang. Eng., 2004

La FREEBANK : vers une base libre de corpus annotés.
Proceedings of the Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2004

Towards a Reference Annotation Framework.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Experiments on Building Language Resources for Multi-Modal Dialogue Systems.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Online Evaluation of Coreference Resolution.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Developping Tools and Building Linguistic Resources for Vietnamese Morpho-syntactic Processing.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Towards an International Standard on Feature Structure Representation.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Multimodal Meaning Representation for Generic Dialogue Systems Architectures.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

A Registry of Standard Data Categories for Linguistic Annotation.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

The French MEDIA/EVALDA Project: the Evaluation of the Understanding Capability of Spoken Language Dialogue Systems.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Standardization in Multimodal Content Representation: Some Methodological Issues.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

A Large Metadata Domain of Language Resources.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

Handling Multilingual Content in Digital Media: The Multilingual Information Framework.
Proceedings of the Knowledge-Based Media Analysis for Self-Adaptive and Agile Multi-Media, 2004

An Extensible Framework for Efficient Document Management using RDF and OWL.
Proceedings of the Proceeedings of the Workshop on NLP and XML: RDF/RDFS and OWL in Language Technology, 2004

SYSTRAN new generation: the XML translation workflow.
Proceedings of Machine Translation Summit IX: Papers, 2003

Outline of the International Standard Linguistic Annotation Framework.
Proceedings of the ACL 2003 Workshop on Linguistic Annotation: Getting the Model Right, 2003

Vulcain - An Ontology-Based Information Extraction System.
Proceedings of the Natural Language Processing and Information Systems, 2002

Towards Reusable NLP Components.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Standards for Language Resources.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

LREP: A Language Repository Exchange Protocol.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Referring to Objects with Spoken and Haptic Modalities.
Proceedings of the 4th IEEE International Conference on Multimodal Interfaces (ICMI 2002), 2002

A Common Framework for Syntactic Annotation.
Proceedings of the Association for Computational Linguistic, 2001

XCES: An XML-based Encoding Standard for Linguistic Corpora.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Silfide: A System for Open Access and Distributed Delivery of TEI Encoded Documents.
Comput. Humanit., 1999

A Contextual Analysis of Referring Gestures.
Proceedings of the 4th International Conference on Intelligent User Interfaces, 1999

The Ecological Approach to Multimodal System Design.
Proceedings of the Gesture-Based Communication in Human-Computer Interaction, 1999

Ecological Interfaces: Extending the Pointing Paradigm by Visual Context.
Proceedings of the Modeling and Using Context, 1999

East meets West: multilingual resources in a European context.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Marking- up multiple views of a text: discourse and reference.
Proceedings of the First International Conference on Language Resources and Evaluation, 1998

Veins Theory: A Model of Global Discourse Cohesion and Coherence.
Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, 1998

Structural Analysis of Co-verbal Deictic Gesture in Multimodal Dialogue Systems.
Proceedings of the Progress in Gestural Interaction, 1996

Frames, a unified model for the representation of reference and space in a man-machine dialogue.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

References in a multimodal dialogue: towards a unified processing.
Proceedings of the Second European Conference on Speech Communication and Technology, 1991

Vers la définition d'un modèle cognitif autour de la représentation du temps dans un système de dialogue Homme-Machine.
PhD thesis, 1989

The use of the Dempster-Shafer rule in the lexical component of a man-machine oral dialogue system.
Speech Commun., 1989

Should an oral dialogue system be modular?
Proceedings of the First European Conference on Speech Communication and Technology, 1989
