Nikola Ljubesic
Orcid: 0000-0001-7169-9152
According to our database1,
Nikola Ljubesic
authored at least 116 papers
between 2008 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?
Mach. Learn., July, 2024
Trans. Assoc. Comput. Linguistics, 2024
CLASSLA-Express: a Train of CLARIN.SI Workshops on Language Resources and Tools with Easily Expanding Route.
CoRR, 2024
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification.
CoRR, 2024
Multilingual Power and Ideology Identification in the Parliament: a Reference Dataset and Simple Baselines.
CoRR, 2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR, 2024
The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary Proceedings.
Proceedings of the Speech and Computer - 26th International Conference, 2024
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the Advances in Information Retrieval, 2024
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Do Language Models Care about Text Quality? Evaluating Web-Crawled Corpora across 11 Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
Correction: Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine.
Appl. Netw. Sci., December, 2023
Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine.
Appl. Netw. Sci., December, 2023
Nat. Lang. Eng., November, 2023
Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models.
Mach. Learn. Knowl. Extr., June, 2023
Lang. Resour. Evaluation, March, 2023
Frontiers Artif. Intell., February, 2023
CoRR, 2023
CoRR, 2023
ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification.
CoRR, 2023
BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023
Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023
Proceedings of the 19th Workshop on Multiword Expressions, 2023
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023
2022
The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia.
CoRR, 2022
The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022
2021
BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian.
CoRR, 2021
Appl. Netw. Sci., 2021
Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection.
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, 2021
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021
Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization.
Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021
2020
Lang. Resour. Evaluation, 2020
Proceedings of the Fifth Conference on Machine Translation, 2020
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
Proceedings of The 12th Language Resources and Evaluation Conference, 2020
2019
Proceedings of the Using Comparable Corpora for Under-Resourced Areas of Machine Translation, 2019
Proceedings of the Using Comparable Corpora for Under-Resourced Areas of Machine Translation, 2019
How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts.
Nat. Lang. Eng., 2019
KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning.
Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019
Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019
What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian.
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019
2018
Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018
Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018
Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings.
Proceedings of The Third Workshop on Representation Learning for NLP, 2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018
Proceedings of the 2nd Workshop on Abusive Language Online, 2018
2017
Lang. Resour. Evaluation, 2017
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017
Proceedings of the Second Workshop on NLP and Computational Social Science, 2017
Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages.
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017
Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene.
Proceedings of the First Workshop on Abusive Language Online, 2017
2016
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016
Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016
Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016
Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016
Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation.
Proceedings of the 13th Conference on Natural Language Processing, 2016
Proceedings of the 13th Conference on Natural Language Processing, 2016
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products, 2016
Dealing with Data Sparseness in SMT with Factured Models and Morphological Expansion: a Case Study on Croatian.
Proceedings of the 19th Annual Conference of the European Association for Machine Translation, 2016
Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian.
Proceedings of the 19th Annual Conference of the European Association for Machine Translation, 2016
TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data.
Proceedings of the COLING 2016, 2016
Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries.
Proceedings of the Selected papers from the CLARIN Annual Conference 2016, 2016
Proceedings of the 10th Web as Corpus Workshop, 2016
Proceedings of the 2nd Workshop on Noisy User-generated Text, 2016
2015
Informatica (Slovenia), 2015
Informatica (Slovenia), 2015
Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling.
Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015
Proceedings of the Recent Advances in Natural Language Processing, 2015
Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons.
Proceedings of the Recent Advances in Natural Language Processing, 2015
Proceedings of the 18th Annual Conference of the European Association for Machine Translation, 2015
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015
Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015
2014
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
caWaC - A web corpus of Catalan and its application to language modeling and machine translation.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2014
Proceedings of the 9th Web as Corpus Workshop, 2014
2013
Informatica (Slovenia), 2013
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, 2013
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013
Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013
2012
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012
Proceedings of the COLING 2012, 2012
2011
Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages.
Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011
Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011
Proceedings of the Recent Advances in Natural Language Processing, 2011
Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction.
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, 2011
2010
Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?
J. Comput. Inf. Technol., 2010
Proceedings of the International Conference on Language Resources and Evaluation, 2010
Proceedings of the International Conference on Language Resources and Evaluation, 2010
2008
Proceedings of the International Conference on Language Resources and Evaluation, 2008
Proceedings of the ITI 2008 30th International Conference on Information Technology Interfaces, 2008