Colin Raffel

According to our database1, Colin Raffel authored at least 107 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Combining Machine Learning and Lifetime-Based Resource Management for Memory Allocation and Beyond.
Commun. ACM, April, 2024

Merging by Matching Models in Task Parameter Subspaces.
Trans. Mach. Learn. Res., 2024

Soft Merging of Experts with Adaptive Routing.
Trans. Mach. Learn. Res., 2024

A Survey on Data Selection for Language Models.
Trans. Mach. Learn. Res., 2024

Realistic Evaluation of Model Merging for Compositional Generalization.
CoRR, 2024

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning.
CoRR, 2024

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale.
CoRR, 2024

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models.
CoRR, 2024

Learning to Route Among Specialized Experts for Zero-Shot Generalization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Building Machine Learning Models Like Open Source Software.
Commun. ACM, February, 2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Efficient Methods for Natural Language Processing: A Survey.
Trans. Assoc. Comput. Linguistics, 2023

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP.
Trans. Assoc. Comput. Linguistics, 2023

Scaling Up Models and Data with t5x and seqio.
J. Mach. Learn. Res., 2023

Merging by Matching Models in Task Subspaces.
CoRR, 2023

Efficient Online Data Mixing For Language Model Pre-Training.
CoRR, 2023

ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization.
CoRR, 2023

NPEFF: Non-Negative Per-Example Fisher Factorization.
CoRR, 2023

Resolving Interference When Merging Models.
CoRR, 2023

TIES-Merging: Resolving Interference When Merging Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scaling Data-Constrained Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Distributed Inference and Fine-tuning of Large Language Models Over The Internet.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models.
Proceedings of the International Conference on Machine Learning, 2023

Large Language Models Struggle to Learn Long-Tail Knowledge.
Proceedings of the International Conference on Machine Learning, 2023

Bidirectional Language Models Are Also Few-shot Learners.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Knowledge is a Region in Weight Space for Fine-tuned Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Evaluating the Factual Consistency of Large Language Models Through News Summarization.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Crosslingual Generalization through Multitask Finetuning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Petals: Collaborative Inference and Fine-tuning of Large Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

2022
Emergent Abilities of Large Language Models.
Trans. Mach. Learn. Res., 2022

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models.
Trans. Assoc. Comput. Linguistics, 2022

ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning.
CoRR, 2022

Evaluating the Factual Consistency of Large Language Models Through Summarization.
CoRR, 2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

What Language Model to Train if You Have One Million GPU Hours?
CoRR, 2022

Petals: Collaborative Inference and Fine-tuning of Large Models.
CoRR, 2022

Efficient Methods for Natural Language Processing: A Survey.
CoRR, 2022

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?
CoRR, 2022

Scaling Up Models and Data with t5x and seqio.
CoRR, 2022

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts.
CoRR, 2022

Compositional Generalization in Unsupervised Compositional Representation Learning: A Study on Disentanglement and Emergent Language.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Merging Models with Fisher-Weighted Averaging.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?
Proceedings of the International Conference on Machine Learning, 2022

Deduplicating Training Data Mitigates Privacy Risks in Language Models.
Proceedings of the International Conference on Machine Learning, 2022


What Language Model to Train if You Have One Million GPU Hours?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Learning with Limited Text Data.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022


2021
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.
CoRR, 2021

Multitask Prompted Training Enables Zero-Shot Task Generalization.
CoRR, 2021

Extracting Training Data from Large Language Models.
Proceedings of the 30th USENIX Security Symposium, 2021

Training Neural Networks with Fixed Sparse Masks.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Robust and Generalizable Visual Representation Learning via Random Convolutions.
Proceedings of the 9th International Conference on Learning Representations, 2021

Improving and Simplifying Pattern Exploiting Training.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
J. Mach. Learn. Res., 2020

WT5?! Training Text-to-Text Models to Explain their Predictions.
CoRR, 2020

Deflecting Adversarial Attacks.
CoRR, 2020

Top-K Training of GANs: Improving Generators by Making Critics Less Critical.
CoRR, 2020

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Top-k Training of GANs: Improving GAN Performance by Throwing Away Bad Samples.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020


Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions.
Proceedings of the 8th International Conference on Learning Representations, 2020

ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring.
Proceedings of the 8th International Conference on Learning Representations, 2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Learning-based Memory Allocation for C++ Server Workloads.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring.
CoRR, 2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling.
CoRR, 2019

MixMatch: A Holistic Approach to Semi-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition.
Proceedings of the 36th International Conference on Machine Learning, 2019

Towards GAN Benchmarks Which Require Generalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Understanding and Improving Interpolation in Autoencoders via an Adversarial Regularizer.
Proceedings of the 7th International Conference on Learning Representations, 2019

Monotonic Infinite Lookback Attention for Simultaneous Machine Translation.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Hickle: A HDF5-based python pickle replacement.
J. Open Source Softw., 2018

Learning a Latent Space of Multitrack Measures.
CoRR, 2018

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Onsets and Frames: Dual-Objective Piano Transcription.
Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music.
Proceedings of the 35th International Conference on Machine Learning, 2018

Is Generator Conditioning Causally Related to GAN Performance?
Proceedings of the 35th International Conference on Machine Learning, 2018

Realistic Evaluation of Semi-Supervised Learning Algorithms.
Proceedings of the 6th International Conference on Learning Representations, 2018

Monotonic Chunkwise Attention.
Proceedings of the 6th International Conference on Learning Representations, 2018

Thermometer Encoding: One Hot Way To Resist Adversarial Examples.
Proceedings of the 6th International Conference on Learning Representations, 2018

Learning Hard Alignments with Variational Inference.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Online and Linear-Time Attention by Enforcing Monotonic Alignments.
Proceedings of the 34th International Conference on Machine Learning, 2017

Training a Subsampling Mechanism in Expectation.
Proceedings of the 5th International Conference on Learning Representations, 2017

Explaining the Learning Dynamics of Direct Feedback Alignment.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching.
PhD thesis, 2016

Theano: A Python framework for fast computation of mathematical expressions.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2016

Extracting Ground-Truth Information from MIDI Files: A MIDIfesto.
Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016

Pruning subsequence search with attention-based embedding.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Optimizing DTW-based audio-to-MIDI alignment and matching.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games.
CoRR, 2015

Feed-Forward Networks with Attention Can Solve Some Long-Term Memory Problems.
CoRR, 2015

librosa: Audio and Music Signal Analysis in Python.
Proceedings of the 14th Python in Science Conference 2015 (SciPy 2015), Austin, Texas, July 6, 2015

Large-Scale Content-Based Matching of MIDI and Audio Files.
Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015

2014
MIR_EVAL: A Transparent Implementation of Common MIR Metrics.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

Estimating timing and channel distortion across related signals.
Proceedings of the IEEE International Conference on Acoustics, 2014

2010
The Lattice Harp: A New Hybrid Instrument And Controller.
Proceedings of the 2010 International Computer Music Conference, 2010


  Loading...