Krister Lindén

Proceedings of the Ancient Language Processing Workshop, 2023

2022

Language Report Finnish.

[BibT_eX]

[DOI]

Wilhelmina Dyster

Proceedings of the European Language Equality, 2022

Lahjoita puhetta - a large-scale corpus of spoken Finnish with some benchmarks.

[BibT_eX]

[DOI]

CoRR, 2022

Optimizing Naive Bayes for Arabic Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the The Seventh Arabic Natural Language Processing Workshop, 2022

HeLI-OTS, Off-the-shelf Language Identifier for Text.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Language Identification as part of the Text Corpus Creation Pipeline at the Language Bank of Finland.

[BibT_eX]

[DOI]

Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022

Lemmatizing and POS-tagging Akkadian with BabyLemmatizer and Dictionary-Based Post-Correction.

[BibT_eX]

[DOI]

Proceedings of the Selected Papers from the CLARIN Annual Conference 2022, 2022

EU Data Governance Act: Outlining a Potential Role for CLARIN.

[BibT_eX]

[DOI]

Proceedings of the Selected Papers from the CLARIN Annual Conference 2022, 2022

The Pipeline for Publishing Resources in the Language Bank of Finland.

[BibT_eX]

[DOI]

Proceedings of the Selected Papers from the CLARIN Annual Conference 2022, 2022

2021

Naive Bayes-based Experiments in Romanian Dialect Identification.

[BibT_eX]

[DOI]

Bharathi Raja Chakravarthi

Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Findings of the VarDial Evaluation Campaign 2021.

[BibT_eX]

[DOI]

Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

The Interaction of Personal Data, Intellectual Property and Freedom of Expression in the Context of Language Research.

[BibT_eX]

[DOI]

Vanessa Hannesschläger

Proceedings of the Selected Papers from the CLARIN Annual Conference 2021, 2021

Legal Issues Related to the Use of Twitter Data in Language Research.

[BibT_eX]

[DOI]

Pawel Kamocki

Vanessa Hannesschläger

Proceedings of the Selected Papers from the CLARIN Annual Conference 2021, 2021

2020

A Finnish news corpus for named entity recognition.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2020

Optical character recognition with neural networks and post-correction with finite state methods.

[BibT_eX]

[DOI]

Int. J. Document Anal. Recognit., 2020

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpus.

[BibT_eX]

[DOI]

CoRR, 2020

Uralic Language Identification (ULI) 2020 shared task dataset and the Wanca 2017 corpora.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

Experiments in Language Variety Geolocation and Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

A Report on the VarDial Evaluation Campaign 2020.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

Akkadian Treebank for early Neo-Assyrian Royal Inscriptions.

[BibT_eX]

[DOI]

Proceedings of the 19th International Workshop on Treebanks and Linguistic Theories, 2020

BabyFST - Towards a Finite-State Based Computational Model of Ancient Babylonian.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Automated Phonological Transcription of Akkadian Cuneiform Text.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

The European Language Technology Landscape in 2020: Language-Centric and Human-Centric AI for Cross-Cultural Communication in Multilingual Europe.

[BibT_eX]

[DOI]

Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Improving Word Association Measures in Repetitive Corpora with Context Similarity Weighting.

[BibT_eX]

[DOI]

Aleksi Sahala

Proceedings of the 12th International Joint Conference on Knowledge Discovery, 2020

Sharing is Caring a Legal Perspective on Sharing Language Data Containing Personal Data and the Division of Liability between Researchers and Research Organisations.

[BibT_eX]

[DOI]

Proceedings of the Selected Papers from the CLARIN Annual Conference 2020, 2020

Building Web Corpora for Minority Languages.

[BibT_eX]

[DOI]

Proceedings of the 12th Web as Corpus Workshop, 2020

2019

Language model adaptation for language and dialect identification of text.

[BibT_eX]

[DOI]

Nat. Lang. Eng., 2019

FinnTransFrame: translating frames in the FinnFrameNet project.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2019

Automatic Language Identification in Texts: A Survey.

[BibT_eX]

[DOI]

J. Artif. Intell. Res., 2019

Language and Dialect Identification of Cuneiform Texts.

[BibT_eX]

[DOI]

CoRR, 2019

Improving OCR of historical newspapers and journals published in Finland.

[BibT_eX]

[DOI]

Pekka Kauppinen

Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage, 2019

A CLARIN Contractual Framework for Sharing Personal Data for Scientific Research.

[BibT_eX]

[DOI]

Aleksei Kelli

Alexandros Nousias

Proceedings of the Selected Papers from the CLARIN Annual Conference 2019, Leipzig, Germany, September 30, 2019

The Impact of Copyright and Personal Data Laws on the Creation and Use of Models for Language Technologies.

[BibT_eX]

[DOI]

Proceedings of the Selected Papers from the CLARIN Annual Conference 2019, Leipzig, Germany, September 30, 2019

2018

The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction.

[BibT_eX]

[DOI]

SIGIR Forum, 2018

From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences (Dagstuhl Perspectives Workshop 17442).

[BibT_eX]

[DOI]

Dagstuhl Manifestos, 2018

HeLI-based Experiments in Swiss German Dialect Identification.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

HeLI-based Experiments in Discriminating Between Dutch and Flemish Subtitles.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Iterative Language Model Adaptation for Indo-Aryan Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Rethinking Summarization and Storytelling for Modern Social Multimedia.

[BibT_eX]

[DOI]

Stevan Rudinac

Tat-Seng Chua

Nicolás E. Díaz Ferreyra

Proceedings of the MultiMedia Modeling - 24th International Conference, 2018

Processing personal data without the consent of the data subject for the development and use of language resources.

[BibT_eX]

[DOI]

Proceedings of the Selected papers from the CLARIN Annual Conference 2018, 2018

2017

Evaluating HeLI with Non-Linear Mappings.

[BibT_eX]

[DOI]

Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Evaluation of language identification methods using 285 languages.

[BibT_eX]

[DOI]

Proceedings of the 21st Nordic Conference on Computational Linguistics, 2017

OCR and post-correction of historical Finnish texts.

[BibT_eX]

[DOI]

Pekka Kauppinen

Proceedings of the 21st Nordic Conference on Computational Linguistics, 2017

Implementation of an Open Science Policy in the context of management of CLARIN language resources: a need for changes?

[BibT_eX]

[DOI]

Proceedings of the Selected papers from the CLARIN Annual Conference 2017, 2017

2016

FinnPos: an open-source morphological tagging and lemmatization toolkit for Finnish.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2016

The strategic impact of META-NET on the regional, national and international level.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2016

HeLI, a Word-Based Backoff Method for Language Identification.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

In-Document Adaptation for a Human Guided Automatic Transcription Service.

[BibT_eX]

[DOI]

André Mansikkaniemi

Mikko Kurimo

Proceedings of the Speech and Computer - 18th International Conference, 2016

2015

Using HFST - Helsinki Finite-State Technology for Recognizing Semantic Frames.

[BibT_eX]

[DOI]

Proceedings of the Systems and Frameworks for Computational Morphology, 2015

Extracting Semantic Frames using hfst-pmatch.

[BibT_eX]

[DOI]

Sam Hardwick

Proceedings of the 20th Nordic Conference of Computational Linguistics, 2015

Automated Lossless Hyper-Minimization for Morphological Analyzers.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing, 2015

The Regulatory and Contractual Framework as an Integral Part of the CLARIN Infrastructure.

[BibT_eX]

[DOI]

Aleksei Kelli

Kadri Vider

Proceedings of the Selected Papers from the CLARIN Annual Conference 2015, 2015

Language Set Identification in Noisy Synthetic Multilingual Documents.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2015

2014

Is it possible to create a very large wordnet in 100 days? An evaluation.

[BibT_eX]

[DOI]

Jyrki Niemi

Lang. Resour. Evaluation, 2014

CLARA: A New Generation of Researchers in Common Language Resources and Their Applications.

[BibT_eX]

[DOI]

Przemyslaw Lenkiewicz

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

HFST-SweNER ― A New NER Resource for Swedish.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Heuristic Hyper-minimization of Finite State Lexicons.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Accelerated Estimation of Conditional Random Fields using a Pseudo-Likelihood-inspired Perceptron Variant.

[BibT_eX]

[DOI]

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

State-of-the-Art in Weighted Finite-State Spell-Checking.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2014

Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013

Using HFST for Creating Computational Linguistic Applications.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics - Applications, 2013

HFST - A System for Creating NLP Tools.

[BibT_eX]

[DOI]

Proceedings of the Systems and Frameworks for Computational Morphology, 2013

Baltic and Nordic Parts of the European Linguistic Infrastructure.

[BibT_eX]

[DOI]

Gyri Smørdal Losnegaard

Proceedings of the 19th Nordic Conference of Computational Linguistics, 2013

Nordic and Baltic wordnets aligned and compared through "WordTies".

[BibT_eX]

[DOI]

Proceedings of the 19th Nordic Conference of Computational Linguistics, 2013

2012

Specifying Treebanks, Outsourcing Parsebanks: FinnTreeBank 3.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Creation of an Open Shared Language Resource Repository in the Nordic and Baltic Countries.

[BibT_eX]

[DOI]

Kristín Jóhannsdóttir

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Representing the Translation Relation in a Bilingual Wordnet.

[BibT_eX]

[DOI]

Jyrki Niemi

Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Predictive Text Entry for Agglutinative Languages Using Unsupervised Morphological Segmentation.

[BibT_eX]

[DOI]

Mirka Hyvärinen

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2012

Extending and Updating the Finnish Wordnet.

[BibT_eX]

[DOI]

Jyrki Niemi

Mirka Hyvärinen

Proceedings of the Shall We Play the Festschrift Game?, 2012

2011

HFST - Framework for Compiling and Applying Morphologies.

[BibT_eX]

[DOI]

Proceedings of the Systems and Frameworks for Computational Morphology, 2011

Combining Statistical Models for POS Tagging using Finite-State Calculus.

[BibT_eX]

[DOI]

Proceedings of the 18th Nordic Conference of Computational Linguistics, 2011

Do wordnets also improve human performance on NLP tasks?

[BibT_eX]

[DOI]

Kristiina Muhonen

Proceedings of the 18th Nordic Conference of Computational Linguistics, 2011

META-NORD: Towards Sharing of Language Resources in Nordic and Baltic Countries.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Language Resources, 2011

2010

Part-of-Speech Tagging Using Parallel Weighted Finite-State Transducers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Natural Language Processing, 2010

Building and Using Existing Hunspell Dictionaries and TeX Hyphenators as Finite-State Automata.

[BibT_eX]

[DOI]

Proceedings of the International Multiconference on Computer Science and Information Technology, 2010

2009

Corpus-Based Lexeme Ranking for Morphological Guessers.

[BibT_eX]

[DOI]

Jussi Tuovila

Proceedings of the State of the Art in Computational Morphology, 2009

HFST Tools for Morphology - An Efficient Open-Source Package for Construction of Morphological Analyzers.

[BibT_eX]

[DOI]

Proceedings of the State of the Art in Computational Morphology, 2009

Conflict Resolution Using Weighted Rules in HFST-TWOLC.

[BibT_eX]

[DOI]

Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009

Corpus-based Paradigm Selection for Morphological Entries.

[BibT_eX]

[DOI]

Jussi Tuovila

Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009

Weighted Finite-State Morphological Analysis of Finnish Compounding with HFST-LEXC.

[BibT_eX]

[DOI]

Proceedings of the 17th Nordic Conference of Computational Linguistics, 2009

Guessers for Finite-State Transducer Lexicons.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2009

2008

A Probabilistic Model for Guessing Base Forms of New Words by Analogy.

[BibT_eX]

[DOI]

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2008

2006

Multilingual modeling of cross-lingual spelling variants.

[BibT_eX]

[DOI]

Inf. Retr., 2006

2004

Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps.

[BibT_eX]

[DOI]

Comput. Humanit., 2004

Finding Cross-Lingual Spelling Variants.

[BibT_eX]

[DOI]