Marcos Zampieri

Orcid: 0000-0002-2346-3847

Affiliations:
  • George Mason University, Fairfax, VA, USA
  • Rochester Institute of Technology, Rochester, NY, USA (former)


According to our database1, Marcos Zampieri authored at least 160 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Health text simplification: An annotated corpus for digestive cancer education and novel strategies for reinforcement learning.
J. Biomed. Informatics, 2024

ARTICLE: Annotator Reliability Through In-Context Learning.
CoRR, 2024

Claim Verification in the Age of Large Language Models: A Survey.
CoRR, 2024

Towards Generalized Offensive Language Identification.
CoRR, 2024

EmoMix-3L: A Code-Mixed Dataset for Bangla-English-Hindi Emotion Detection.
CoRR, 2024

Collaborative Design for Job-Seekers with Autism: A Conceptual Framework for Future Research.
CoRR, 2024

A Federated Learning Approach to Privacy Preserving Offensive Language Identification.
CoRR, 2024

MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thoughts.
CoRR, 2024

MultiLS: A Multi-task Lexical Simplification Framework.
CoRR, 2024

MasonTigers at SemEval-2024 Task 9: Solving Puzzles with an Ensemble of Chain-of-Thought Prompts.
Proceedings of the 18th International Workshop on Semantic Evaluation, 2024

MasonTigers at SemEval-2024 Task 1: An Ensemble Approach for Semantic Textual Relatedness.
Proceedings of the 18th International Workshop on Semantic Evaluation, 2024

Classifying Human-Generated and AI-Generated Election Claims in Social Media.
Proceedings of the 21st International Conference on Security and Cryptography, 2024

DISC: A Dataset for Information Security Classification.
Proceedings of the 21st International Conference on Security and Cryptography, 2024

Native Language Identification in Texts: A Survey.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

CSEPrompts: A Benchmark of Introductory Computer Science Prompts.
Proceedings of the Foundations of Intelligent Systems - 27th International Symposium, 2024

A Survey of Multimodal Sarcasm Detection.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Rater Cohesion and Quality from a Vicarious Perspective.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Deep Contrastive Active Learning for Out-of-domain Filtering in Dialog Systems.
Proceedings of the 11th IEEE International Conference on Data Science and Advanced Analytics, 2024

Language Variety Identification with True Labels.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

MentalHelp: A Multi-Task Dataset for Mental Health in Social Media.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

MasonPerplexity at Multimodal Hate Speech Event Detection 2024: Hate Speech and Target Detection Using Transformer Ensembles.
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text, 2024


GMU at MLSP 2024: Multilingual Lexical Simplification with Transformer Models.
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications, 2024

2023
OffensEval 2023: Offensive language identification in the age of Large Language Models.
Nat. Lang. Eng., November, 2023

Preface: Special issue on NLP approaches to offensive content online.
Nat. Lang. Eng., November, 2023

Offensive language identification with multi-task learning.
J. Intell. Inf. Syst., June, 2023

Features of lexical complexity: insights from L1 and L2 speakers.
Frontiers Artif. Intell., February, 2023

Lexical Complexity Prediction: An Overview.
ACM Comput. Surv., 2023

nlpBDpatriots at BLP-2023 Task 2: A Transfer Learning Approach to Bangla Sentiment Analysis.
CoRR, 2023

nlpBDpatriots at BLP-2023 Task 1: A Two-Step Classification for Violence Inciting Text Detection in Bangla.
CoRR, 2023

Offensive Language Identification in Transliterated and Code-Mixed Bangla.
CoRR, 2023

OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification.
CoRR, 2023

SentMix-3L: A Bangla-English-Hindi Code-Mixed Dataset for Sentiment Analysis.
CoRR, 2023

Deep Learning Approaches to Lexical Simplification: A Survey.
CoRR, 2023

Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification.
CoRR, 2023

Vicarious Offense and Noise Audit of Offensive Speech Classifiers.
CoRR, 2023

Findings of the VarDial Evaluation Campaign 2023.
Proceedings of the Tenth Workshop on NLP for Similar Languages, Varieties and Dialects, 2023

Publish or Hold? Automatic Comment Moderation in Luxembourgish News Articles.
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, 2023

A Text-to-Text Model for Multilingual Offensive Language Identification.
Proceedings of the Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023, 2023

Overview of the HASOC Subtrack at FIRE 2023: Hate-Speech Identification in Sinhala and Gujarati.
Proceedings of the Working Notes of FIRE 2023, 2023

Overview of the HASOC Subtracks at FIRE 2023: Hate Speech and Offensive Content Identification in Assamese, Bengali, Bodo, Gujarati and Sinhala.
Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023

Vicarious Offense and Noise Audit of Offensive Speech Classifiers: Unifying Human and Machine Disagreement on What is Offensive.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Understanding the Language of ADHD and Autism Communities on Social Media.
Proceedings of the IEEE International Conference on Big Data, 2023

ALEXSIS+: Improving Substitute Generation and Selection for Lexical Simplification with Information Retrieval.
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications, 2023

Target-Based Offensive Language Identification.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Teacher and Student Models of Offensive Language in Social Media.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
An Ensemble Approach for Annotating Source Code Identifiers With Part-of-Speech Tags.
IEEE Trans. Software Eng., 2022

Multilingual Offensive Language Identification for Low-resource Languages.
ACM Trans. Asian Low Resour. Lang. Inf. Process., 2022

Predicting the type and target of offensive social media posts in Marathi.
Soc. Netw. Anal. Min., 2022

Predicting lexical complexity in English texts: the Complex 2.0 dataset.
Lang. Resour. Evaluation, 2022

Lexical simplification benchmarks for English, Portuguese, and Spanish.
Frontiers Artif. Intell., 2022

SOLD: Sinhala Offensive Language Dataset.
CoRR, 2022

Transfer Learning Methods for Domain Adaptation in Technical Logbook Datasets.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Overview of the HASOC Subtrack at FIRE 2022: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages.
Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation, 2022

Overview of the HASOC Subtrack at FIRE 2022: Offensive Language Identification in Marathi.
Proceedings of the Working Notes of FIRE 2022, 2022

ALEXSIS-PT: A New Resource for Portuguese Lexical Simplification.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

2021
An Evaluation of Multilingual Offensive Language Identification Methods for the Languages of India.
Inf., 2021

The Role of Machine Translation Quality Estimation in the Post-Editing Workflow.
Informatics, 2021

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages.
CoRR, 2021

Domain-specific MT for Low-resource Languages: The case of Bambara-French.
Proceedings of the 2nd AfricaNLP Workshop Proceedings, AfricaNLP@EACL 2021, Virtual Event, 2021

Predicting Lexical Complexity in English Texts.
CoRR, 2021


Comparing Approaches to Dravidian Language Identification.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

Findings of the VarDial Evaluation Campaign 2021.
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, 2021

SemEval-2021 Task 1: Lexical Complexity Prediction.
Proceedings of the 15th International Workshop on Semantic Evaluation, 2021

WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans.
Proceedings of the 15th International Workshop on Semantic Evaluation, 2021

LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction.
Proceedings of the 15th International Workshop on Semantic Evaluation, 2021

Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi.
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), 2021

MUDES: Multilingual Detection of Offensive Spans.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, 2021

WLV-RIT at GermEval 2021: Multitask Learning with Transformers to Detect Toxic, Engaging, and Fact-Claiming Comments.
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, 2021

Transformer Models for Offensive Language Identification in Marathi.
Proceedings of the Working Notes of FIRE 2021, 2021

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech.
Proceedings of the FIRE 2021: Forum for Information Retrieval Evaluation, Virtual Event, India, December 13, 2021

Overview of the HASOC Subtrack at FIRE 2021: HateSpeech and Offensive Content Identification in English and Indo-Aryan Languages.
Proceedings of the Working Notes of FIRE 2021, 2021

fBERT: A Neural Transformer for Identifying Offensive Content.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

A Computational Exploration of Pejorative Language in Social Media.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

An Exploratory Analysis of the Relation between Offensive Language and Mental Health.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Handling Extreme Class Imbalance in Technical Logbook Datasets.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
Dataset, July, 2020

Natural language processing for similar languages, varieties, and dialects: A survey.
Nat. Lang. Eng., 2020

Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara.
CoRR, 2020

A Large-Scale Semi-Supervised Dataset for Offensive Language Identification.
CoRR, 2020

Assessing Human Translations from French to Bambara for Machine Learning: a Pilot Study.
Proceedings of the 1st AfricaNLP Workshop Proceedings, 2020

CompLex - A New Corpus for Lexical Complexity Predicition from Likert Scale Data.
CoRR, 2020

Neural Machine Translation for Similar Languages: The Case of Indo-Aryan Languages.
Proceedings of the Fifth Conference on Machine Translation, 2020


A Report on the VarDial Evaluation Campaign 2020.
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 2020

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020).
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

CompLex - A New Corpus for Lexical Complexity Predicition from LikertScale Data.
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties, 2020

Offensive Language Identification in Greek.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

NLP Tools for Predictive Maintenance Records in MaintNet.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations, 2020

WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language Identification in Code-switched YouTube Comments.
Proceedings of the Working Notes of FIRE 2020, 2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

MaintNet: A Collaborative Open-Source Library for Predictive Maintenance Language Resources.
Proceedings of the 28th International Conference on Computational Linguistics, 2020

Evaluating Aggression Identification in Social Media.
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, 2020

Health Care Misinformation: an Artificial Intelligence Challenge for Low-resource languages.
Proceedings of the AAAI Fall Symposium on AI for Social Good, 2020

2019
Preface.
Nat. Lang. Eng., 2019

Automatic Language Identification in Texts: A Survey.
J. Artif. Intell. Res., 2019

UDS-DFKI Submission to the WMT2019 Similar Language Translation Shared Task.
CoRR, 2019

Experiments in Cuneiform Language Identification.
CoRR, 2019

UDS-DFKI Submission to the WMT2019 Czech-Polish Similar Language Translation Shared Task.
Proceedings of the Fourth Conference on Machine Translation, 2019

Findings of the 2019 Conference on Machine Translation (WMT19).
Proceedings of the Fourth Conference on Machine Translation, 2019

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval).
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019

UTFPR at SemEval-2019 Task 5: Hate Speech Identification with Recurrent Neural Networks.
Proceedings of the 13th International Workshop on Semantic Evaluation, 2019

Predicting the Type and Target of Offensive Posts in Social Media.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Improving CAT Tools in the Translation Workflow: New Approaches and Evaluation.
Proceedings of Machine Translation Summit XVII Volume 2: Translator, 2019

BRUMS at HASOC 2019: Deep Learning Models for Multilingual Hate Speech and Offensive Language Identification.
Proceedings of the Working Notes of FIRE 2019, 2019

Large-scale Data Harvesting for Biographical Data.
Proceedings of the Third Conference on Biographical Data in a Digital World 2019, 2019

2018
Challenges in discriminating profanity from hate speech.
J. Exp. Theor. Artif. Intell., 2018

Classifier Ensembles for Dialect and Language Variety Identification.
CoRR, 2018

Language Identification and Morphosyntactic Tagging: The Second VarDial Evaluation Campaign.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

A Neural Approach to Language Variety Translation.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Discriminating between Indo-Aryan Languages Using SVM Ensembles.
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects, 2018

Portuguese Native Language Identification.
Proceedings of the Computational Processing of the Portuguese Language, 2018

LIdioms: A Multilingual Linked Idioms Data Set.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

RDF2PT: Generating Brazilian Portuguese Texts from RDF Data.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Benchmarking Aggression Identification in Social Media.
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying, 2018

A Report on the Complex Word Identification Shared Task 2018.
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications@NAACL-HLT 2018, 2018

A Portuguese Native Language Identification Dataset.
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications@NAACL-HLT 2018, 2018

Classifying Patent Applications with Ensemble Methods.
Proceedings of the Australasian Language Technology Association Workshop 2018, 2018

2017
Compiling and Processing Historical and Contemporary Portuguese Corpora.
CoRR, 2017

Linguistic Features of Genre and Method Variation in Translation: A Computational Perspective.
CoRR, 2017

Findings of the VarDial Evaluation Campaign 2017.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Arabic Dialect Identification Using iVectors and ASR Transcripts.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

German Dialect Identification in Interview Transcriptions.
Proceedings of the Fourth Workshop on NLP for Similar Languages, 2017

Predicting the Law Area and Decisions of French Supreme Court Cases.
Proceedings of the International Conference Recent Advances in Natural Language Processing, 2017

Detecting Hate Speech in Social Media.
Proceedings of the International Conference Recent Advances in Natural Language Processing, 2017

Exploring the Use of Text Classification in the Legal Domain.
Proceedings of the Second Workshop on Automated Semantic Analysis of Information in Legal Texts co-located with the 16th International Conference on Artificial Intelligence and Law (ICAIL 2017), 2017

Including Dialects and Language Varieties in Author Profiling.
Proceedings of the Working Notes of CLEF 2017, 2017

Native Language Identification on Text and Speech.
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, 2017

Complex Word Identification: Challenges in Data Annotation and System Performance.
Proceedings of the 4th Workshop on Natural Language Processing Techniques for Educational Applications, 2017

2016
Improving translation memory matching and retrieval using paraphrases.
Mach. Transl., 2016

USAAR: An Operation Sequential Model for Automatic Statistical Post-Editing.
Proceedings of the First Conference on Machine Translation, 2016


Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Arabic Dialect Identification in Speech Transcripts.
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, 2016

Grammatical Annotation of Historical Portuguese: Generating a Corpus-Based Diachronic Dictionary.
Proceedings of the Text, Speech, and Dialogue - 19th International Conference, 2016

MacSaar at SemEval-2016 Task 11: Zipfian and Character Features for ComplexWord Identification.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

MAZA at SemEval-2016 Task 11: Detecting Lexical Complexity Using a Decision Stump Meta-Classifier.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

LTG at SemEval-2016 Task 11: Complex Word Identification with Classifier Ensembles.
Proceedings of the 10th International Workshop on Semantic Evaluation, 2016

Predicting Post Severity in Mental Health Forums.
Proceedings of the 3rd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, 2016

Modeling Language Change in Historical Corpora: The Case of Portuguese.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

CATaLog Online: Porting a Post-editing Tool to the Web.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Discriminating Similar Languages: Evaluations and Explorations.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

CATaLog Online: A Web-based CAT Tool for Distributed Translation with Data Capture for APE and Translation Process Research.
Proceedings of the COLING 2016, 2016

2015
Investigating Genre and Method Variation in Translation Using Text Classification.
Proceedings of the Text, Speech, and Dialogue - 18th International Conference, 2015

AMBRA: A Ranking Approach to Temporal Text Classification.
Proceedings of the 9th International Workshop on Semantic Evaluation, 2015

Searching for Context: a Study on Document-Level Labels for Translation Quality Estimation.
Proceedings of the 18th Annual Conference of the European Association for Machine Translation, 2015

Can Translation Memories afford not to use paraphrasing?
Proceedings of the 18th Annual Conference of the European Association for Machine Translation, 2015

2014
A Report on the DSL Shared Task 2014.
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, 2014

Between Sound and Spelling: Combining Phonetics and Clustering Algorithms to Improve Target Word Recovery.
Proceedings of the Advances in Natural Language Processing, 2014

VarClass: An Open-source Language Identification Tool for Language Varieties.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Quantifying the Influence of MT Output in the Translators' Performance: A Case Study in Technical Translation.
Proceedings of the Workshop on Humans and Computer-assisted Translation, 2014

Temporal Text Ranking and Automatic Dating of Texts.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

2013
Stylistic Changes for Temporal Text Classification.
Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

N-gram Language Models and POS Distribution for the Identification of Spanish Varieties (Ngrammes et Traits Morphosyntaxiques pour la Identification de Variétés de l'Espagnol) [in French].
Proceedings of the Traitement Automatique des Langues Naturelles, 2013

Effective Spell Checking Methods Using Clustering Algorithms.
Proceedings of the Recent Advances in Natural Language Processing, 2013

Improving Native Language Identification with TF-IDF Weighting.
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, 2013

2012
Automatic identification of language varieties: The case of Portuguese.
Proceedings of the 11th Conference on Natural Language Processing, 2012

2010
P-AWL: Academic Word List for Portuguese.
Proceedings of the Computational Processing of the Portuguese Language, 2010


  Loading...