2024

Scaling Speech Technology to 1, 000+ Languages.

[DOI]

,

,

,

Paden Tomasello

,

,

,

,

,

,

Maryam Fazel-Zarandi

,

,

,

,

,

,

J. Mach. Learn. Res., 2024

2023

OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav.

[DOI]

,

,

,

,

,

,

Oleksandr Maksymets

,

CoRR, 2023

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language.

[DOI]

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

Measuring the Impact of Domain Factors in Self-Supervised Pre-Training.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Toward Joint Language Modeling for Speech Units and Text.

[DOI]

,

Chung-Ming Chien

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Introducing Semantics into Speech Encoders.

[DOI]

,

,

,

,

,

,

Akshat Shrivastava

,

,

Liang-Hsuan Tseng

,

,

,

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Introducing Semantics into Speech Encoders.

[DOI]

,

,

,

,

,

Akshat Shrivastava

,

,

Liang-Hsuan Tseng

,

,

,

,

,

CoRR, 2022

Offline Visual Representation Learning for Embodied Navigation.

[DOI]

,

,

,

Vincent-Pierre Berges

,

,

,

,

Oleksandr Maksymets

CoRR, 2022

Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training.

[DOI]

,

,

,

CoRR, 2022

Towards End-to-End Unsupervised Speech Recognition.

[DOI]

Alexander H. Liu

,

,

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Masked Autoencoders that Listen.

[DOI]

,

,

,

,

,

Wojciech Galuba

,

,

Christoph Feichtenhofer

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Simple and Effective Zero-shot Cross-lingual Phoneme Recognition.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

On-demand compute reduction with stochastic wav2vec 2.0.

[DOI]

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Wav2Vec-Aug: Improved self-supervised training with limited data.

[DOI]

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Simple and Effective Unsupervised Speech Synthesis.

[DOI]

Alexander H. Liu

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale.

[DOI]

,

,

,

Kushal Lakhotia

,

,

,

,

Patrick von Platen

,

,

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.

[DOI]

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2022

Improved Language Identification Through Cross-Lingual Self-Supervised Learning.

[DOI]

,

Diptanu Gon Choudhury

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Unified Speech-Text Pre-training for Speech Translation and Recognition.

[DOI]

,

,

,

,

,

,

,

,

Abdelrahman Mohamed

,

,

Juan Miguel Pino

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Improved Language Identification Through Cross-Lingual Self-Supervised Learning.

[DOI]

,

Diptanu Gon Choudhury

,

,

,

,

,

,

CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.

[DOI]

Kushal Lakhotia

,

Evgeny Kharitonov

,

,

,

,

,

,

,

,

Adelrahman Mohamed

,

Emmanuel Dupoux

CoRR, 2021

Unsupervised Speech Recognition.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Large-Scale Self- and Semi-Supervised Learning for Speech Translation.

[DOI]

,

,

,

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.

[DOI]

,

,

,

Tatiana Likhomanenko

,

,

,

,

,

Ronan Collobert

,

Gabriel Synnaeve

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Cross-Lingual Representation Learning for Speech Recognition.

[DOI]

,

,

Ronan Collobert

,

Abdelrahman Mohamed

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Comparison of Discrete Latent Variable Models for Speech Representation Learning.

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Self-Training and Pre-Training are Complementary for Speech Recognition.

[DOI]

,

,

Tatiana Likhomanenko

,

Paden Tomasello

,

,

Ronan Collobert

,

Gabriel Synnaeve

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Reservoir Transformers.

[DOI]

,

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

Multilingual Speech Translation from Efficient Finetuning of Pretrained Models.

[DOI]

,

,

,

,

,

Juan Miguel Pino

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Reservoir Transformer.

[DOI]

,

,

,

,

,

CoRR, 2020

The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling.

[DOI]

,

Maureen de Seyssel

,

,

Morgane Rivière

,

Evgeny Kharitonov

,

,

,

Emmanuel Dupoux

CoRR, 2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.

[DOI]

,

,

Abdelrahman Mohamed

,

CoRR, 2020

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.

[DOI]

,

,

Abdelrahman Mohamed

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations.

[DOI]

,

Steffen Schneider

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Effectiveness of Self-Supervised Pre-Training for ASR.

[DOI]

,

Abdelrahman Mohamed

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Effectiveness of self-supervised pre-training for speech recognition.

[DOI]

,

,

Abdelrahman Mohamed

CoRR, 2019

Facebook FAIR's WMT19 News Translation Task Submission.

[DOI]

,

,

,

,

,

Proceedings of the Fourth Conference on Machine Translation, 2019

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Pre-trained language model representations for language generation.

[DOI]

,

,

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

wav2vec: Unsupervised Pre-Training for Speech Recognition.

[DOI]

Steffen Schneider

,

,

Ronan Collobert

,

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pay Less Attention with Lightweight and Dynamic Convolutions.

[DOI]

,

,

,

Yann N. Dauphin

,

Proceedings of the 7th International Conference on Learning Representations, 2019

Adaptive Input Representations for Neural Language Modeling.

[DOI]

,

Proceedings of the 7th International Conference on Learning Representations, 2019

Cloze-driven Pretraining of Self-attention Networks.

[DOI]

,

,

,

Luke Zettlemoyer

,

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019