Vit Suchomel

Jan Kraus

Proceedings of the 16th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2022

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.

[BibT_eX]

[DOI]

Marta Bañón

Miquel Esplà-Gomis

Mikel L. Forcada

Cristian García-Romero

Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2021

Website Properties in Relation to the Quality of Text Extracted for Web Corpora.

[BibT_eX]

[DOI]

Jan Kraus

Proceedings of the 15th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2021

2020

Removing Spam from Web Corpora Through Supervised Learning and Semi-manual Classification of Web Sites.

[BibT_eX]

[DOI]

Proceedings of the 14th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2020

Current Challenges in Web Corpus Building.

[BibT_eX]

[DOI]

Proceedings of the 12th Web as Corpus Workshop, 2020

2019

A New Approach for Semi-Automatic Building and Extending a Multilingual Terminology Thesaurus.

[BibT_eX]

[DOI]

Int. J. Artif. Intell. Tools, 2019

Discriminating Between Similar Languages Using Large Web Corpora.

[BibT_eX]

[DOI]

Proceedings of the 13th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2019

2018

csTenTen17, a Recent Czech Web Corpus.

[BibT_eX]

Proceedings of the 12th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2018

2016

DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Annotated Amharic Corpora.

[BibT_eX]

[DOI]

Pavel Rychlý

Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

Terminology Extraction for Academic Slovene Using Sketch Engine.

[BibT_eX]

[DOI]

Darja Fiser

Milos Jakubícek

Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016

2015

Corpus Based Extraction of Hypernyms.

[BibT_eX]

[DOI]

Proceedings of the 9th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2015

2014

arTenTen: Arabic Corpus and Word Sketches.

[BibT_eX]

[DOI]

J. King Saud Univ. Comput. Inf. Sci., 2014

Semiautomatic Building and Extension of Terminological Thesaurus for Land Surveying Domain.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

Intelligent Search and Replace for Czech Phrases.

[BibT_eX]

[DOI]

Zuzana Neverilová

Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

Text Tokenisation Using unitok.

[BibT_eX]

[DOI]

Jan Michelfeit

Jan Pomikálek

Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

SkELL: Web Interface for English Language Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Finding Terms in Corpora for Many Languages with the Sketch Engine.

[BibT_eX]

[DOI]

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

2013

Intrinsic Methods for Comparison of Corpora.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2013

2012

Recent Czech Web Corpora.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

Towards 100M Morphologically Annotated Corpus of Tajik.

[BibT_eX]

[DOI]

Gulshan Dovudov

Pavel Smerk

Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

Detecting Spam Content in Web Corpora.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

2011

Practical Web Crawling for Text Corpora.

[BibT_eX]

[DOI]

Jan Pomikálek

Proceedings of the 5th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2011

chared: Character Encoding Detection with a Known Language.

[BibT_eX]

[DOI]

Jan Pomikálek