Vit Suchomel

According to our database1, Vit Suchomel authored at least 28 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining.
CoRR, 2024

2023
MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 2023

2022
Semi-Manual Annotation of Topics and Genres in Web Corpora, The Cheap and Fast Way.
Proceedings of the 16th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2022

MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages.
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation, 2022

2021
Website Properties in Relation to the Quality of Text Extracted for Web Corpora.
Proceedings of the 15th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2021

2020
Removing Spam from Web Corpora Through Supervised Learning and Semi-manual Classification of Web Sites.
Proceedings of the 14th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2020

Current Challenges in Web Corpus Building.
Proceedings of the 12th Web as Corpus Workshop, 2020

2019
A New Approach for Semi-Automatic Building and Extending a Multilingual Terminology Thesaurus.
Int. J. Artif. Intell. Tools, 2019

Discriminating Between Similar Languages Using Large Web Corpora.
Proceedings of the 13th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2019

2018
csTenTen17, a Recent Czech Web Corpus.
Proceedings of the 12th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2018

2016
DSL Shared Task 2016: Perfect Is The Enemy of Good Language Discrimination Through Expectation-Maximization and Chunk-based Language Model.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Annotated Amharic Corpora.
Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

Terminology Extraction for Academic Slovene Using Sketch Engine.
Proceedings of the 10th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2016

2015
Corpus Based Extraction of Hypernyms.
Proceedings of the 9th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2015

2014
arTenTen: Arabic Corpus and Word Sketches.
J. King Saud Univ. Comput. Inf. Sci., 2014

Semiautomatic Building and Extension of Terminological Thesaurus for Land Surveying Domain.
Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

Intelligent Search and Replace for Czech Phrases.
Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

Text Tokenisation Using unitok.
Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

SkELL: Web Interface for English Language Learning.
Proceedings of the 8th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2014

HindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Finding Terms in Corpora for Many Languages with the Sketch Engine.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

2013
Intrinsic Methods for Comparison of Corpora.
Proceedings of the 7th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2013

2012
Recent Czech Web Corpora.
Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

Towards 100M Morphologically Annotated Corpus of Tajik.
Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

Detecting Spam Content in Web Corpora.
Proceedings of the 6th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2012

2011
Practical Web Crawling for Text Corpora.
Proceedings of the 5th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2011

chared: Character Encoding Detection with a Known Language.
Proceedings of the 5th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2011

Building a 50M Corpus of Tajik Language.
Proceedings of the 5th Workshop on Recent Advances in Slavonic Natural Languages Processing, 2011


  Loading...