Catherine Arnett

According to our database1, Catherine Arnett authored at least 9 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Toxicity of the Commons: Curating Open-Source Pre-Training Data.
CoRR, 2024

Goldfish: Monolingual Language Models for 350 Languages.
CoRR, 2024

Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics.
CoRR, 2024

Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement.
CoRR, 2024

A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages.
CoRR, 2024

BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models.
CoRR, 2023

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023


  Loading...