Taja Kuzman
Orcid: 0000-0001-7436-9896
According to our database1,
Taja Kuzman
authored at least 13 papers
between 2017 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification.
IEEE Access, 2025
2024
CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route.
CoRR, 2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR, 2024
Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models.
Mach. Learn. Knowl. Extr., June, 2023
ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification.
CoRR, 2023
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023
2022
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022
2017
Proceedings of the Computational and Corpus-Based Phraseology, 2017