Aline Villavicencio

Orcid: 0000-0002-3731-9168

  • University of Sheffield, UK
  • Federal University of Rio Grande do Sul, Porto Alegre, Brazil (former)

According to our database1, Aline Villavicencio authored at least 100 papers between 1995 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Assessing linguistic generalisation in language models: a dataset for Brazilian Portuguese.
Lang. Resour. Evaluation, March, 2024

Representation transfer and data cleaning in multi-views for text simplification.
Pattern Recognit. Lett., January, 2024

Multi-perspective thought navigation for source-free entity linking.
Pattern Recognit. Lett., 2024

Vocabulary Expansion for Low-resource Cross-lingual Transfer.
CoRR, 2024

Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection.
CoRR, 2024

Is Less More? Quality, Quantity and Context in Idiom Processing with Natural Language Models.
CoRR, 2024

An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Generative LLM Inference.
CoRR, 2024

Word Boundary Information Isn't Useful for Encoder Language Models.
CoRR, 2024

ShefCDTeam at SemEval-2024 Task 4: A Text-to-Text Model for Multi-Label Classification.
Proceedings of the 18th International Workshop on Semantic Evaluation, 2024

Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Understanding the effects of negative (and positive) pointwise mutual information on word vectors.
J. Exp. Theor. Artif. Intell., November, 2023

FLYPE: Multitask Prompt Tuning for Multimodal Human Understanding of Social Media.
Proceedings of the 2nd International Workshop on Multimodal Human Understanding for the Web and Social Media co-located with the 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023), 2023

Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5.
CoRR, 2022

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding.
Proceedings of the 16th International Workshop on Semantic Evaluation, SemEval@NAACL 2022, 2022

Sample Efficient Approaches for Idiomaticity Detection.
Proceedings of the 18th Workshop on Multiword Expressions, 2022

Improving Tokenisation by Alternative Treatment of Spaces.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Unsupervised Word Segmentation from Discrete Speech Units in Low-Resource Settings.
CoRR, 2021

What if the whole is greater than the sum of the parts? Modelling Complex (Multiword) Expressions (invited paper).
Proceedings of the First Workshop on Current Trends in Text Simplification (CTTS 2021) co-located with the 37th Conference of the Spanish Society for Natural Language Processing (SEPLN2021), 2021

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Probing for idiomaticity in vector space models.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Assessing the Representations of Idiomaticity in Vector Models with a Noun Compound Dataset Labeled at Type and Token Levels.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

CogNLP-Sheffield at CMCL 2021 Shared Task: Blending Cognitively Inspired Features with Transformer-based Language Models for Predicting Eye Tracking Patterns.
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, 2021

Investigating alignment interpretability for low-resource NMT.
Mach. Transl., 2020

Investigating Language Impact in Bilingual Approaches for Computational Language Documentation.
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages and Collaboration and Computing for Under-Resourced Languages, 2020

Discovering multiword expressions.
Nat. Lang. Eng., 2019

How the Brain Represents Language and Answers Questions? Using an AI System to Understand the Underlying Neurobiological Mechanisms.
Frontiers Comput. Neurosci., 2019

How Does Language Influence Documentation Workflow? Unsupervised Word Discovery Using Translations in Multiple Languages.
CoRR, 2019

Why So Down? The Role of Negative (and Positive) Pointwise Mutual Information in Distributional Semantics.
CoRR, 2019

Unsupervised Compositionality Prediction of Nominal Compounds.
Comput. Linguistics, 2019

When the whole is greater than the sum of its parts: Multiword expressions and idiomaticity.
Proceedings of the Joint Workshop on Multiword Expressions and WordNet, 2019

Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-Resource Settings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Incorporating Subword Information into Matrix Factorization Word Embeddings.
CoRR, 2018

A Small Griko-Italian Speech Translation Corpus.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018

A Corpus Study of Verbal Multiword Expressions in Brazilian Portuguese.
Proceedings of the Computational Processing of the Portuguese Language, 2018

Similarity Measures for the Detection of Clinical Conditions with Verbal Fluency Tasks.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018

The brWaC Corpus: A New Open Resource for Brazilian Portuguese.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Unsupervised Word Segmentation from Speech with Attention.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

Restricted Recurrent Neural Tensor Networks: Exploiting Word Frequency and Compositionality for Increased Model Capacity and Performance With No Computational Overhead.
CoRR, 2017

LexSubNC: A Dataset of Lexical Substitution for Nominal Compounds.
Proceedings of the IWCS 2017 - 12th International Conference on Computational Semantics - Short papers, Montpellier, France, September 19, 2017

Unwritten languages demand attention too! Word discovery with encoder-decoder models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Enhancing the LexVec Distributed Word Representation Model Using Positional Contexts and External Memory.
CoRR, 2016

UFRGS&LIF at SemEval-2016 Task 10: Rule-Based MWE Identification and Predominant-Supersense Tagging.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Joining Forces for Multiword Expression Identification.
Proceedings of the Computational Processing of the Portuguese Language, 2016

The Portuguese B ^2 2 SG: A Semantic Test for Distributional Thesaurus.
Proceedings of the Computational Processing of the Portuguese Language, 2016

Crawling by Readability Level.
Proceedings of the Computational Processing of the Portuguese Language, 2016

Filtering and Measuring the Intrinsic Quality of Human Compositionality Judgments.
Proceedings of the 12th Workshop on Multiword Expressions, 2016

VerbLexPor: a lexical resource with semantic roles for Portuguese.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

B2SG: a TOEFL-like Task for Portuguese.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Multiword Expressions in Child Language.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

mwetoolkit+sem: Integrating Word Embeddings in the mwetoolkit for Semantic MWE Processing.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

How Naked is the Naked Truth? A Multilingual Lexicon of Nominal Compound Compositionality.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Predicting the Compositionality of Nominal Compounds: Giving Word Embeddings a Hard Time.
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2016

Automatic Construction of Large Readability Corpora.
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity, 2016

VerbLexPor: um recurso léxico com anotação de papéis semânticos para o português (VerbLexPor: a lexical resource annotated with semantic roles for Portuguese).
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology, 2015

Distributional Thesauri for Portuguese: methodology evaluation.
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology, 2015

brWaC: A WaCky Corpus for Brazilian Portuguese.
Proceedings of the Computational Processing of the Portuguese Language, 2014

Comparing Similarity Measures for Distributional Thesauri.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Comparing the Quality of Focused Crawlers and of the Translation Resources Obtained from them.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Identification of Multiword Expressions in the brWaC.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Size Does Not Matter. Frequency Does. A Study of Features for Measuring Lexical Complexity.
Proceedings of the Advances in Artificial Intelligence - IBERAMIA 2014, 2014

Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

Computational Modeling as a Methodology for Studying Human Language Learning.
Proceedings of the Cognitive Aspects of Computational Language Acquisition, 2013

Introduction to the special issue on multiword expressions: From theory to practice and use.
ACM Trans. Speech Lang. Process., 2013

Language Acquisition and Probabilistic Models: keeping it simple.
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013

<i>Syntax-Based Collocation Extraction</i>, by Violeta Seretan. Berlin: Springer, 2011. ISBN-10 9400701330, ISBN-13 978-9400701335. $139.00/£90.00 (Hardcover) xi + 220 pages.
Nat. Lang. Eng., 2012

A large scale annotated child language construction database.
Proceedings of the Eighth International Conference on Language Resources and Evaluation, 2012

Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques.
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology, 2011

Extração e Validação de Ontologias a partir de Recursos Digitais.
Proceedings of Joint IV Seminar on Ontology Research in Brazil and VI International Workshop on Metamodels, 2011

Sistema de Aquisição Semi-Automática de Ontologias.
Proceedings of Joint IV Seminar on Ontology Research in Brazil and VI International Workshop on Metamodels, 2011

Identifying and Analyzing Brazilian Portuguese Complex Predicates.
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, 2011

Fast and Flexible MWE Candidate Generation with the mwetoolkit.
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, 2011

Identification and Treatment of Multiword Expressions Applied to Information Retrieval.
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, 2011

Alignment-based extraction of multiword expressions.
Lang. Resour. Evaluation, 2010

Identificação de Expressões Multipalavra em Domínios Específicos.
Linguamática, 2010

An Investigation on the Influence of Frequency on the Lexical Organization of Verbs.
Proceedings of TextGraphs@ACL 2010 Workshop on Graph-based Methods for Natural Language Processing, 2010

Question Answering for Portuguese: How Much Is Needed?
Proceedings of the Advances in Artificial Intelligence - SBIA 2010, 2010

A Hybrid Approach for Multiword Expression Identification.
Proceedings of the Computational Processing of the Portuguese Language, 2010

mwetoolkit: a Framework for Multiword Expression Identification.
Proceedings of the International Conference on Language Resources and Evaluation, 2010

COMUNICA - A Question Answering System for Brazilian Portuguese.
Proceedings of the COLING 2010, 2010

Multiword Expressions in the wild? The mwetoolkit comes in handy.
Proceedings of the COLING 2010, 2010

Web-based and combined language models: a case study on noun compound identification.
Proceedings of the COLING 2010, 2010

Prepositions in Applications: A Survey and Introduction to the Special Issue.
Comput. Linguistics, 2009

Statistically-Driven Alignment-Based Multiword Expression Identification for Technical Domains.
Proceedings of the Workshop on Multiword Expressions: Identification, 2009

Picking them up and Figuring them out: Verb-Particle Constructions, Noise and Idiomaticity.
Proceedings of the Twelfth Conference on Computational Natural Language Learning, 2008

UFRGS@CLEF2008: Indexing Multiword Expressions for Information Retrieval.
Proceedings of the Working Notes for CLEF 2008 Workshop co-located with the 12th European Conference on Digital Libraries (ECDL 2008) , 2008

Validation and Evaluation of Automatically Acquired Multiword Expressions for Grammar Engineering.
Proceedings of the EMNLP-CoNLL 2007, 2007

Introduction to the special issue on multiword expressions: Having a crack at a hard nut.
Comput. Speech Lang., 2005

The availability of verb-particle constructions in lexical resources: How much is enough?
Comput. Speech Lang., 2005

A Multilingual Database of Idioms.
Proceedings of the Fourth International Conference on Language Resources and Evaluation, 2004

The acquisition of a unification-based generalised categorial grammar.
PhD thesis, 2002

Multiword expressions: linguistic precision and reusability.
Proceedings of the Third International Conference on Language Resources and Evaluation, 2002

Learning to Distinguish PP Arguments from Adjuncts.
Proceedings of the 6th Conference on Natural Language Learning, 2002

Extracting the Unextractable: A Case Study on Verb-particles.
Proceedings of the 6th Conference on Natural Language Learning, 2002

The Acquisition of Word Order by a Computational Learning System.
Proceedings of the Fourth Conference on Computational Natural Language Learning, 2000

Representing a System of Lexical Types Using Default Unification.
Proceedings of the EACL 1999, 1999

Part-of-Speech Tagging for Portuguese Texts.
Proceedings of the Advances in Artificial Intelligence, 1995

A Hierarchial Description of the Portuguese Verb.
Proceedings of the Advances in Artificial Intelligence, 1995
