Benoît Sagot
According to our database1,
Benoît Sagot
authored at least 182 papers
between 2004 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation.
CoRR, 2024
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck.
CoRR, 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
Proceedings of the COnférence en Recherche d'Informations et Applications, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
When Your Cousin Has the Right Connections: Unsupervised Bilingual Lexicon Induction for Related Data-Imbalanced Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Trans. Assoc. Comput. Linguistics, 2023
A Simple Method for Unsupervised Bilingual Lexicon Induction for Data-Imbalanced, Closely Related Language Pairs.
CoRR, 2023
Proceedings of the Eighth Conference on Machine Translation, 2023
Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023
Cross-lingual Strategies for Low-resource Language Modeling: A Study on Five Indic Dialects.
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023
Towards a Robust Detection of Language Model-Generated Text: Is ChatGPT that easy to detect?
Proceedings of the Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles, TALN 2023 - Volume 1 : travaux de recherche originaux, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Trans. Assoc. Comput. Linguistics, 2022
Trans. Assoc. Comput. Linguistics, 2022
IEEE J. Sel. Top. Signal Process., 2022
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling.
CoRR, 2022
CoRR, 2022
Inria-ALMAnaCH at WMT 2022: Does Transcription Help Cross-Script Machine Translation?
Proceedings of the Seventh Conference on Machine Translation, 2022
Quand être absent de mBERT n'est que le commencement : Gérer de nouvelles langues à l'aide de modèles de langues multilingues (When Being Unseen from mBERT is just the Beginning : Handling New Languages With Multilingual Language Models).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022
Le projet FREEM : ressources, outils et enjeux pour l'étude du français d'Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Complex Labelling and Similarity Prediction in Legal Texts: Automatic Analysis of France's Court of Cassation Rulings.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
MANTa: Efficient Gradient-Based Tokenization for End-to-End Robust Language Modeling.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
2021
Sensors, 2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.
CoRR, 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
CoRR, 2021
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Can Character-based Language Models Improve Downstream Task Performances In Low-Resource And Noisy Language Scenarios?
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
2020
Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi.
CoRR, 2020
Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
2019
Développement d'un lexique morphologique et syntaxique de l'ancien français (Development of a morphological and syntactic lexicon of Old French).
Proceedings of the Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts, 2019
Proceedings of the 5th Workshop on Noisy User-generated Text, 2019
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
2018
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018
Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Brussels, Belgium, October 31, 2018
Informatiser le lexique - Modélisation, développement et exploitation de lexiques morphologiques, syntaxiques et sémantiques. (Computerising the lexicon - Modelling, development and use of morphological, syntactic and semantic lexicons).
, 2018
2017
Construction automatique d'une base de données étymologiques à partir du wiktionary (Automatic construction of an etymological database using Wiktionary).
Proceedings of the Actes des 24ème Conférence sur le Traitement Automatique des Langues Naturelles, 2017
Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin.
Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, 2017
Proceedings of the 15th International Conference on Parsing Technologies, 2017
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, 2017
Proceedings of the 11th Linguistic Annotation Workshop, 2017
2016
Étiquetage multilingue en parties du discours avec MElt (Multilingual part-of-speech tagging with MElt).
Proceedings of the Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 2 : TALN (Posters), 2016
From Noisy Questions to Minecraft Texts: Annotation Challenges in Extreme Syntax Scenario.
Proceedings of the 2nd Workshop on Noisy User-generated Text, 2016
2015
Lang. Resour. Evaluation, 2015
2014
Lang. Resour. Evaluation, 2014
J. Lang. Technol. Comput. Linguistics, 2014
Named Entity Recognition and Correction in OCRized Corpora (Détection et correction automatique d'entités nommées dans des corpus OCRisés) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014
Sub-categorization in 'pour' and lexical syntax (Sous-catégorisation en pour et syntaxe lexicale) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014
Analogy-based Text Normalization : the case of unknowns words (Normalisation de textes par analogie: le cas des mots inconnus) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2014
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
DeLex, a freely-avaible, large-scale and linguistically grounded morphological lexicon for German.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
An Open-Source Heavily Multilingual Translation Graph Extracted from Wiktionaries and Parallel Corpora.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, 2014
2013
Dynamic extension of a French morphological lexicon based a text stream (Extension dynamique de lexiques morphologiques pour le français à partir d'un flux textuel) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2013
Proceedings of the Systems and Frameworks for Computational Morphology, 2013
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2013
Proceedings of the Seventh SIGHAN Workshop on Chinese Language Processing, 2013
2012
Lang. Resour. Evaluation, 2012
Annotation référentielle du Corpus Arboré de Paris 7 en entités nommées (Referential named entity annotation of the Paris 7 French TreeBank) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012
TCOF-POS : un corpus libre de français parlé annoté en morphosyntaxe (TCOF-POS : A Freely Available POS-Tagged Corpus of Spoken French) [in French].
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, 2012
Population of a Knowledge Base for News Metadata from Unstructured Text and Web Data.
Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction, 2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Wordnet extension made simple: A multilingual lexicon-based approach using wiki resources.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Boosting the Coverage of a Semantic Lexicon by Automatically Extracted Event Nominalizations.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the COLING 2012, 2012
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012
Proceedings of the Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, 2012
2011
Trait. Autom. des Langues, 2011
Évaluation de lexiques syntaxiques par leur intégartion dans l'analyseur syntaxiques FRMG
CoRR, 2011
Construction d'un lexique des adjectifs dénominaux (Construction of a lexicon of denominal adjectives).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011
Développement de ressources pour le persan : PerLex 2, nouveau lexique morphologique et MEltfa, étiqueteur morphosyntaxique (Development of resources for Persian: PerLex 2, a new morphological lexicon and MEltfa, a morphosyntactic tagger).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011
Un turc mécanique pour les ressources linguistiques : critique de la myriadisation du travail parcellisé (Mechanical Turk for linguistic resources: review of the crowdsourcing of parceled work).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011
Segmentation et induction de lexique non-supervisées du mandarin (Unsupervised segmentation and induction of mandarin lexicon).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2011
Coopération de méthodes statistiques et symboliques pour l'adaptation non-supervisée d'un système d'étiquetage en entités nommées (Statistical and symbolic methods cooperation for the unsupervised adaptation of a named entity recognition system).
Proceedings of the Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2011
Proceedings of the Systems and Frameworks for Computational Morphology, 2011
Proceedings of the Human Language Technology Challenges for Computer Science and Linguistics, 2011
Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use.
Proceedings of the Human Language Technology Challenges for Computer Science and Linguistics, 2011
Proceedings of the Evaluation of Natural Language and Speech Tools for Italian, 2011
2010
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010
Développement de ressources pour le persan: lexique morphologique et chaîne de traitements de surface.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2010
Exploitation d'une ressource lexicale pour la construction d'un étiqueteur morpho-syntaxique état-de-l'art du français.
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2010
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010
Proceedings of the Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2010
Proceedings of the 10th International Workshop on Tree Adjoining Grammar and Related Frameworks, 2010
Proceedings of the International Conference on Language Resources and Evaluation, 2010
Proceedings of the International Conference on Language Resources and Evaluation, 2010
The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French.
Proceedings of the International Conference on Language Resources and Evaluation, 2010
Proceedings of the Fourth Linguistic Annotation Workshop, 2010
Proceedings of the ACL 2010, 2010
Proceedings of the Trends in Parsing Technology, 2010
2009
Proces. del Leng. Natural, 2009
Construcción y extensión de un léxico morfológico y sintáctico para el español: el Leffe.
Proces. del Leng. Natural, 2009
Intégrer les tables du Lexique-Grammaire à un analyseur syntaxique robuste à grande échelle.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2009
Trouver et confondre les coupables : un processus sophistiqué de correction de lexique.
Proceedings of the Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2009
Proceedings of the Recent Advances in Natural Language Processing, 2009
Proceedings of the Recent Advances in Natural Language Processing, 2009
Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009
Building a morphological and syntactic lexicon by merging various linguistic resources.
Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009
MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars (Application Note).
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings, May 31, 2009
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009
Proceedings of the Human Language Technology. Challenges for Computer Science and Linguistics, 2009
Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 2009
Proceedings of the 11th International Workshop on Parsing Technologies (IWPT-2009), 2009
Proceedings of the Formal Grammar - 14th International Conference, 2009
2008
Trait. Autom. des Langues, 2008
Proces. del Leng. Natural, 2008
Proceedings of the Text, Speech and Dialogue, 11th International Conference, 2008
Proceedings of the Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2008
Proceedings of the COLING 2008, 2008
2007
Comparaison du Lexique-Grammaire des verbes pleins et de DICOVALENCE : vers une intégration dans le Lefff.
Proceedings of the Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2007
Proceedings of the Human Language Technology. Challenges of the Information Society, 2007
Mining Parsing Results for Lexical Correction: Toward a Complete Correction Process of Wide-Coverage Lexicons.
Proceedings of the Human Language Technology. Challenges of the Information Society, 2007
Proceedings of the Tenth International Conference on Parsing Technologies, 2007
2006
Modélisation et analyse des coordinations elliptiques par l'exploitation dynamique des forêts de dérivation.
Proceedings of the Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters, 2006
Proceedings of the Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2006
Modeling and Analysis of Elliptic Coordination by Dynamic Exploitation of Derivation Forests in LTAG Parsing.
Proceedings of the Eighth International Workshop on Tree Adjoining Grammar and Related Formalisms, 2006
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006
Proceedings of the Fifth International Conference on Language Resources and Evaluation, 2006
2005
Proceedings of the Text, Speech and Dialogue, 8th International Conference, 2005
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2005
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts, 2005
Proceedings of the Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2005
Proceedings of the Logical Aspects of Computational Linguistics, 2005
Proceedings of the Ninth International Workshop on Parsing Technology, 2005
2004
Coupling Grammar and Knowledge Base: Range Concatenation Grammars and Description Logics.
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004
Les Grammaires à Concaténation d'Intervalles (RCG) comme formalisme grammatical pour la linguistique.
Proceedings of the Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs, 2004
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004