Nikola Ljubesic

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

CLASSLA-web: Comparable Web Corpora of South Slavic Languages Enriched with Linguistic and Genre Annotation.

[BibT_eX]

[DOI]

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

A Lightweight Approach to a Giga-Corpus of Historical Periodicals: The Story of a Slovenian Historical Newspaper Collection.

[BibT_eX]

[DOI]

Filip Dobranic

Bojan Evkoski

Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

2023

Correction: Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine.

[BibT_eX]

[DOI]

Bojan Evkoski

Petra Kralj Novak

Appl. Netw. Sci., December, 2023

Content-based comparison of communities in social networks: Ex-Yugoslavian reactions to the Russian invasion of Ukraine.

[BibT_eX]

[DOI]

Bojan Evkoski

Petra Kralj Novak

Appl. Netw. Sci., December, 2023

Quantifying the impact of context on the quality of manual hate speech annotation.

[BibT_eX]

[DOI]

Igor Mozetic

Petra Kralj Novak

Nat. Lang. Eng., November, 2023

Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models.

[BibT_eX]

[DOI]

Igor Mozetic

Mach. Learn. Knowl. Extr., June, 2023

The ParlaMint corpora of parliamentary proceedings.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, March, 2023

Who are the haters? A corpus-based demographic analysis of authors of hate speech.

[BibT_eX]

[DOI]

Frontiers Artif. Intell., February, 2023

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark.

[BibT_eX]

[DOI]

Joseph Marvin Imperial

CoRR, 2023

CLASSLA-Stanza: The Next Step for Linguistic Processing of South Slavic Languages.

[BibT_eX]

[DOI]

Luka Tercon

CoRR, 2023

ChatGPT: Beginning of an End of Manual Linguistic Data Annotation? Use Case of Automatic Genre Identification.

[BibT_eX]

[DOI]

Igor Mozetic

CoRR, 2023

BENCHić-lang: A Benchmark for Discriminating between Bosnian, Croatian, Montenegrin and Serbian.

[BibT_eX]

[DOI]

Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Get to Know Your Parallel Data: Performing English Variety and Genre Classification over MaCoCu Corpora.

[BibT_eX]

[DOI]

Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Findings of the VarDial Evaluation Campaign 2023.

[BibT_eX]

[DOI]

Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

PARSEME corpus release 1.3.

[BibT_eX]

[DOI]

Proceedings of the 19th Workshop on Multiword Expressions, 2023

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.

[BibT_eX]

[DOI]

Aarón Galiano Jiménez

Jaume Zaragoza-Bernabeu

Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

2022

The ParlaSent-BCS dataset of sentiment-annotated parliamentary debates from Bosnia-Herzegovina, Croatia, and Serbia.

[BibT_eX]

[DOI]

Michal Mochtak

CoRR, 2022

The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.

[BibT_eX]

[DOI]

Marta Bañón

Miquel Esplà-Gomis

Mikel L. Forcada

Cristian García-Romero

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2021

The KAS corpus of Slovenian academic writing.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2021

Retweet communities reveal the main sources of hate speech.

[BibT_eX]

[DOI]

CoRR, 2021

Community evolution in retweet networks.

[BibT_eX]

[DOI]

CoRR, 2021

BERTić - The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian.

[BibT_eX]

[DOI]

Davor Lauc

CoRR, 2021

Evolution of topics and hate speech in retweet network communities.

[BibT_eX]

[DOI]

Appl. Netw. Sci., 2021

Exploring Stylometric and Emotion-Based Features for Multilingual Cross-Domain Hate Speech Detection.

[BibT_eX]

[DOI]

Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, 2021

Social Media Variety Geolocation with geoBERT.

[BibT_eX]

[DOI]

Bharathi Raja Chakravarthi

Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Findings of the VarDial Evaluation Campaign 2021.

[BibT_eX]

[DOI]

Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Cultural Topic Modelling over Novel Wikipedia Corpora for South-Slavic Languages.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021

Sesame Street to Mount Sinai: BERT-constrained character-level Moses models for multilingual lexical normalization.

[BibT_eX]

[DOI]

Proceedings of the Seventh Workshop on Noisy User-generated Text, 2021

2020

The Janes project: language resources and tools for Slovene user generated content.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2020

Findings of the 2020 Conference on Machine Translation (WMT20).

[BibT_eX]

[DOI]

Proceedings of the Fifth Conference on Machine Translation, 2020

HeLju@VarDial 2020: Social Media Variety Geolocation with BERT Models.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

A Report on the VarDial Evaluation Campaign 2020.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

SemEval-2020 Task 3: Graded Word Similarity in Context.

[BibT_eX]

[DOI]

Carlos Santos Armendariz

Mohammad Taher Pilehvar

Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

Gigafida 2.0: The Reference Corpus of Written Standard Slovene.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context.

[BibT_eX]

[DOI]

Carlos Santos Armendariz

Mark Granroth-Wilding

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

2019

Extracting Data from Comparable Corpora.

[BibT_eX]

[DOI]

Proceedings of the Using Comparable Corpora for Under-Resourced Areas of Machine Translation, 2019

Appendices.

[BibT_eX]

[DOI]

Ahmet Aker

Radu Ion

Nikos Mastropavlos

Monica Lestari Paramita

Proceedings of the Using Comparable Corpora for Under-Resourced Areas of Machine Translation, 2019

How to tag non-standard language: Normalisation versus domain adaptation for Slovene historical and user-generated texts.

[BibT_eX]

[DOI]

Katja Zupan

Nat. Lang. Eng., 2019

CoSimLex: A Resource for Evaluating Graded Word Similarity in Context.

[BibT_eX]

[DOI]

Carlos Santos Armendariz

Mark Granroth-Wilding

Kristiina Vaik

CoRR, 2019

KAS-term: Extracting Slovene Terms from Doctoral Theses via Supervised Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

The FRENK Datasets of Socially Unacceptable Discourse in Slovene and English.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech, and Dialogue - 22nd International Conference, 2019

What does Neural Bring? Analysing Improvements in Morphosyntactic Annotation and Lemmatisation of Slovenian, Croatian and Serbian.

[BibT_eX]

[DOI]

Kaja Dobrovoljc

Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, 2019

2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Comparing CRF and LSTM performance on the task of morphosyntactic tagging of non-standard varieties of South Slavic languages.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Predicting Concreteness and Imageability of Words Within and Across Languages via Word Embeddings.

[BibT_eX]

[DOI]

Anita Peti-Stantic

Proceedings of The Third Workshop on Representation Learning for NLP, 2018

Bleaching Text: Abstract Features for Cross-lingual Gender Prediction.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Datasets of Slovene and Croatian Moderated News Comments.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Abusive Language Online, 2018

2017

Crawl and crowd to bring machine translation to under-resourced languages.

[BibT_eX]

[DOI]

Raphael Rubino

Andy Way

Lang. Resour. Evaluation, 2017

Findings of the VarDial Evaluation Campaign 2017.

[BibT_eX]

[DOI]

Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Language-independent Gender Prediction on Twitter.

[BibT_eX]

[DOI]

Proceedings of the Second Workshop on NLP and Computational Social Science, 2017

Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

Adapting a State-of-the-Art Tagger for South Slavic Languages to Non-Standard Text.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, 2017

Legal Framework, Dataset and Annotation Schema for Socially Unacceptable Online Discourse Practices in Slovene.

[BibT_eX]

[DOI]

Proceedings of the First Workshop on Abusive Language Online, 2017

2016

Enlarging Scarce In-domain English-Croatian Corpus for SMT of MOOCs Using Serbian.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Detecting Semantic Shifts in Slovene Twitterese.

[BibT_eX]

[DOI]

Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016

Gold-Standard Datasets for Annotation of Slovene Computer-Mediated Communication.

[BibT_eX]

[DOI]

Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016

Croatian Error-Annotated Corpus of Non-Professional Written Language.

[BibT_eX]

[DOI]

Vanja Stefanec

Jelena Kuvac Kraljevic

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

New Inflectional Lexicons and Training Corpora for Improved Morphosyntactic Annotation of Croatian and Serbian.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Producing Monolingual and Parallel Web Corpora at the Same Time - SpiderLing and Bitextor's Love Affair.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Corpus-Based Diacritic Restoration for South Slavic Languages.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Automatic normalisation of the Swiss German ArchiMob corpus using character-level machine translation.

[BibT_eX]

[DOI]

Víctor M. Sánchez-Cartagena

Proceedings of the 13th Conference on Natural Language Processing, 2016

Normalising Slovene data: historical texts vs. user-generated content.

[BibT_eX]

[DOI]

Proceedings of the 13th Conference on Natural Language Processing, 2016

Abu-MaTran: automatic building of machine translation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products, 2016

Dealing with Data Sparseness in SMT with Factured Models and Morphological Expansion: a Case Study on Croatian.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the European Association for Machine Translation, 2016

Collaborative Development of a Rule-Based Machine Translator between Croatian and Serbian.

[BibT_eX]

[DOI]

Gema Ramírez-Sánchez

Proceedings of the 19th Annual Conference of the European Association for Machine Translation, 2016

TweetGeo - A Tool for Collecting, Processing and Analysing Geo-encoded Linguistic Data.

[BibT_eX]

[DOI]

Tanja Samardzic

Curdin Derungs

Proceedings of the COLING 2016, 2016

Closing a Gap in the Language Resources Landscape: Groundwork and Best Practices from Projects on Computer-mediated Communication in four European Countries.

[BibT_eX]

[DOI]

Proceedings of the Selected papers from the CLARIN Annual Conference 2016, 2016

A Global Analysis of Emoji Usage.

[BibT_eX]

[DOI]

Proceedings of the 10th Web as Corpus Workshop, 2016

Private or Corporate? Predicting User Types on Twitter.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Noisy User-generated Text, 2016

2015

Discriminating Between Closely Related Languages on Twitter.

[BibT_eX]

[DOI]

Denis Kranjcic

Informatica (Slovenia), 2015

*MWELex - MWE Lexica of Croatian, Slovene and Serbian Extracted from Parsed Corpora.

[BibT_eX]

[DOI]

Kaja Dobrovoljc

Informatica (Slovenia), 2015

The slWaC Corpus of the SloveneWeb.

[BibT_eX]

[DOI]

Natasa Logar

Informatica (Slovenia), 2015

Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling.

[BibT_eX]

[DOI]

Antonio Toral

Proceedings of the Tenth Workshop on Statistical Machine Translation, 2015

Predicting the Level of Text Standardness in User-generated Content.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Natural Language Processing, 2015

Predicting Inflectional Paradigms and Lemmata of Unknown Words for Semi-automatic Expansion of Morphological Lexicons.

[BibT_eX]

[DOI]

Miquel Esplà-Gomis

Nives Mikelic Preradovic

Proceedings of the Recent Advances in Natural Language Processing, 2015

Abu-MaTran: Automatic building of Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the European Association for Machine Translation, 2015

Regional Linguistic Data Initiative (ReLDI).

[BibT_eX]

[DOI]

Tanja Samardzic

Maja Milicevic

Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015

Universal Dependencies for Croatian (that work for Serbian, too).

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing, 2015

2014

A Report on the DSL Shared Task 2014.

[BibT_eX]

[DOI]

Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014

Quality Estimation for Synthetic Parallel Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

caWaC - A web corpus of Catalan and its application to language modeling and machine translation.

[BibT_eX]

[DOI]

Antonio Toral

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

TweetCaT: a tool for building Twitter corpora of smaller languages.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

The SETimes.HR Linguistically Annotated Corpus of Croatian.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Standardizing Tweets with Character-Level Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2014

{bs, hr, sr}WaC - Web Corpora of Bosnian, Croatian and Serbian.

[BibT_eX]

[DOI]

Proceedings of the 9th Web as Corpus Workshop, 2014

2013

Vector Disambiguation for Translation Extraction from Comparable Corpora.

[BibT_eX]

[DOI]

Marianna Apidianaki

Informatica (Slovenia), 2013

Cross-lingual WSD for Translation Extraction from Comparable Corpora.

[BibT_eX]

[DOI]

Marianna Apidianaki

Proceedings of the Sixth Workshop on Building and Using Comparable Corpora, 2013

Identifying false friends between closely related languages.

[BibT_eX]

[DOI]

Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013

Lemmatization and Morphosyntactic Tagging of Croatian and Serbian.

[BibT_eX]

[DOI]

Danijela Merkler

Proceedings of the 4th Biennial International Workshop on Balto-Slavic Natural Language Processing, 2013

2012

Addressing polysemy in bilingual lexicon extraction from comparable corpora.

[BibT_eX]

[DOI]

Ozren Kubelka

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Efficient Discrimination Between Closely Related Languages.

[BibT_eX]

[DOI]

Jörg Tiedemann

Proceedings of the COLING 2012, 2012

2011

Bootstrapping Bilingual Lexicons from Comparable Corpora for Closely Related Languages.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

hrWaC and slWac: Compiling Web Corpora for Croatian and Slovene.

[BibT_eX]

[DOI]

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Bilingual lexicon extraction from comparable corpora for closely related languages.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Natural Language Processing, 2011

Building and Using Comparable Corpora for Domain-Specific Bilingual Lexicon Extraction.

[BibT_eX]

[DOI]

Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web, 2011

2010

Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

[BibT_eX]

[DOI]

Petra Bago

Damir Boras

J. Comput. Inf. Technol., 2010

Building a Gold Standard for Event Detection in Croatian.

[BibT_eX]

[DOI]

Tomislava Lauc

Damir Boras

Proceedings of the International Conference on Language Resources and Evaluation, 2010

Towards Sentiment Analysis of Financial Texts in Croatian.

[BibT_eX]

[DOI]

Marko Tadic

Proceedings of the International Conference on Language Resources and Evaluation, 2010

2008

Generating a Morphological Lexicon of Organization Entity Names.

[BibT_eX]

[DOI]