Pedro Ortiz Suarez
Orcid: 0000-0003-0343-8852
According to our database1,
Pedro Ortiz Suarez
authored at least 26 papers
between 2020 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on twitter.com
-
on orcid.org
On csauthors.net:
Bibliography
2024
Occiglot at WMT24: European Open-source Large Language Models Evaluated on Translation.
Proceedings of the Ninth Conference on Machine Translation, 2024
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
A CURATEd CATalog: Rethinking the Extraction of Pretraining Corpora for Mid-Resourced Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Semi-automatic staging area for high-quality structured data extraction from scientific literature.
CoRR, 2023
2022
A Data-driven Approach to Natural Language Processing for Contemporary and Historical French. (Une approche basée sur les données pour le traitement automatique du langage naturel en français contemporain et historique).
PhD thesis, 2022
Trans. Assoc. Comput. Linguistics, 2022
Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data.
CoRR, 2022
Automatic Extraction of Materials and Properties from Superconductors Scientific Literature.
CoRR, 2022
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources.
CoRR, 2022
Le projet FREEM : ressources, outils et enjeux pour l'étude du français d'Ancien Régime (The F RE EM project: Resources, tools and challenges for the study of Ancien Régime French).
Proceedings of the Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the 29th International Conference on Computational Linguistics, 2022
2020
Les modèles de langue contextuels Camembert pour le français : impact de la taille et de l'hétérogénéité des données d'entrainement (C AMEM BERT Contextual Language Models for French: Impact of Training Data Size and Heterogeneity ).
Proceedings of the Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
SinNer@Clef-Hipe2020 : Sinful adaptation of SotA models for Named Entity Recognition in French and German.
Proceedings of the Working Notes of CLEF 2020, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020