Taja Kuzman

According to our database1, Taja Kuzman authored at least 11 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR, 2024

Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023
Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models.
Mach. Learn. Knowl. Extr., June, 2023

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification.
CoRR, 2023

BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

2022
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2017
Verbal Multiword Expressions in Slovene.
Proceedings of the Computational and Corpus-Based Phraseology, 2017


  Loading...