David R. Mortensen

Orcid: 0000-0002-3927-6851

Affiliations:
  • Carnegie Mellon University, Pittsburgh, Language Technologies Institute


According to our database1, David R. Mortensen authored at least 68 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Review of the Applications of Deep Learning-Based Emergent Communication.
Trans. Mach. Learn. Res., 2024

Self-supervised Speech Representations Still Struggle with African American Vernacular English.
CoRR, 2024

ELCC: the Emergent Language Corpus Collection.
CoRR, 2024

Carrot and Stick: Inducing Self-Motivation with Positive & Negative Feedback.
CoRR, 2024

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction.
CoRR, 2024

Neural Proto-Language Reconstruction.
CoRR, 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.
CoRR, 2024

Mitigating the Linguistic Gap with Phonemic Representations for Robust Multilingual Language Understanding.
CoRR, 2024

Automating Sound Change Prediction for Phylogenetic Inference: A Tukanoan Case Study.
CoRR, 2024

XferBench: a Data-Driven Benchmark for Emergent Language.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

PWESuite: Phonetic Word Embeddings and Tasks They Facilitate.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Phonotactic Complexity across Dialects.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Improved Neural Protoform Reconstruction via Reflex Prediction.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Semisupervised Neural Proto-Language Reconstruction.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
PWESuite: Phonetic Word Embeddings and Tasks They Facilitate.
CoRR, 2023

Construction Grammar Provides Unique Insight into Neural Language Models.
CoRR, 2023

ChatGPT MT: Competitive for High- (but Not Low-) Resource Languages.
Proceedings of the Eighth Conference on Machine Translation, 2023

Multilingual TTS Accent Impressions for Accented ASR.
Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

Generalized Glossing Guidelines: An Explicit, Human- and Machine-Readable, Item-and-Process Convention for Morphological Annotation.
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023

SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing.
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023

Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

African Substrates Rather Than European Lexifiers to Augment African-diaspora Creole Translation.
Proceedings of the 4th Workshop on African Natural Language Processing, 2023

Transformed Protoform Reconstruction.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

2022
Mathematically Modeling the Lexicon Entropy of Emergent Language.
CoRR, 2022

Recommendations for Systematic Research on Emergent Language.
CoRR, 2022

Modeling Emergent Lexicon Formation with a Self-Reinforcing Stochastic Process.
CoRR, 2022

AUTOLEX: An Automatic Framework for Linguistic Exploration.
CoRR, 2022

Learning the Ordering of Coordinate Compounds and Elaborate Expressions in Hmong, Lahu, and Chinese.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

A Hmong Corpus with Elaborate Expression Annotations.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Phone Inventories and Recognition for Every Language.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Data-adaptive Transfer Learning for Translation: A Case Study in Haitian and Jamaican.
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages, 2022

When Is TTS Augmentation Through a Pivot Language Useful?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ASR2K: Speech Recognition for Around 2000 Languages without Audio.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

WikiHan: A New Comparative Dataset for Chinese Languages.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Quantifying Cognitive Factors in Lexical Decline.
Trans. Assoc. Comput. Linguistics, 2021

Phoneme Recognition through Fine Tuning of Phonetic Representations: a Case Study on Luhya Language Varieties.
Proceedings of the 2nd AfricaNLP Workshop Proceedings, AfricaNLP@EACL 2021, Virtual Event, 2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Evaluating the Morphosyntactic Well-formedness of Generated Texts.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

2020
Ranking Transfer Languages with Pragmatically-Motivated Features for Multilingual Sentiment Analysis.
CoRR, 2020

Where New Words Are Born: Distributional Semantic Analysis of Neologisms and Their Semantic Neighborhoods.
CoRR, 2020

Characterizing Sociolinguistic Variation in the Competing Vaccination Communities.
Proceedings of the Social, Cultural, and Behavioral Modeling, 2020

AlloVera: A Multilingual Allophone Database.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Universal Phone Recognition with a Multilingual Allophone System.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Automatic Extraction of Rules Governing Morphological Agreement.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Towards Zero-Shot Learning for Automatic Phonemic Transcription.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Low-Resource Machine Translation using Interlinear Glosses.
CoRR, 2019

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology.
CoRR, 2019

The ARIEL-CMU Systems for LoReHLT18.
CoRR, 2019

2018
The ARIEL-CMU situation frame detection pipeline for LoReHLT16: a model translation approach.
Mach. Transl., 2018

Epitran: Precision G2P for Many Languages.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Parser combinators for Tigrinya and Oromo morphology.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Adapting Word Embeddings to New Languages with Morphological and Phonological Subword Representations.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017
URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors.
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2017

2016
Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning.
Proceedings of the NAACL HLT 2016, 2016

Bridge-Language Capitalization Inference in Western Iranian: Sorani, Kurmanji, Zazaki, and Tajik.
Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, 2016

Phonologically Aware Neural Model for Named Entity Recognition in Low Resource Transfer Settings.
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

PanPhon: A Resource for Mapping IPA Segments to Articulatory Feature Vectors.
Proceedings of the COLING 2016, 2016

Named Entity Recognition for Linguistic Rapid Response in Low-Resource Languages: Sorani Kurdish and Tajik.
Proceedings of the COLING 2016, 2016


  Loading...