2024
Scaling Speech Technology to 1, 000+ Languages.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
J. Mach. Learn. Res., 2024
2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav.
CoRR, 2023
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language.
Proceedings of the International Conference on Machine Learning, 2023
Measuring the Impact of Domain Factors in Self-Supervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023
Toward Joint Language Modeling for Speech Units and Text.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Introducing Semantics into Speech Encoders.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
Introducing Semantics into Speech Encoders.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Offline Visual Representation Learning for Embodied Navigation.
CoRR, 2022
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training.
CoRR, 2022
Towards End-to-End Unsupervised Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Masked Autoencoders that Listen.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Simple and Effective Zero-shot Cross-lingual Phoneme Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
On-demand compute reduction with stochastic wav2vec 2.0.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Wav2Vec-Aug: Improved self-supervised training with limited data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Simple and Effective Unsupervised Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language.
Proceedings of the International Conference on Machine Learning, 2022
Improved Language Identification Through Cross-Lingual Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022
Unified Speech-Text Pre-training for Speech Translation and Recognition.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
2021
Improved Language Identification Through Cross-Lingual Self-Supervised Learning.
CoRR, 2021
Generative Spoken Language Modeling from Raw Audio.
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
Unsupervised Speech Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Large-Scale Self- and Semi-Supervised Learning for Speech Translation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Unsupervised Cross-Lingual Representation Learning for Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Comparison of Discrete Latent Variable Models for Speech Representation Learning.
Proceedings of the IEEE International Conference on Acoustics, 2021
Self-Training and Pre-Training are Complementary for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Multilingual Speech Translation from Efficient Finetuning of Pretrained Models.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
2020
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling.
CoRR, 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.
CoRR, 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations.
Proceedings of the 8th International Conference on Learning Representations, 2020
Effectiveness of Self-Supervised Pre-Training for ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
Effectiveness of self-supervised pre-training for speech recognition.
CoRR, 2019
Facebook FAIR's WMT19 News Translation Task Submission.
Proceedings of the Fourth Conference on Machine Translation, 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Pre-trained language model representations for language generation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
wav2vec: Unsupervised Pre-Training for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Pay Less Attention with Lightweight and Dynamic Convolutions.
Proceedings of the 7th International Conference on Learning Representations, 2019
Adaptive Input Representations for Neural Language Modeling.
Proceedings of the 7th International Conference on Learning Representations, 2019
Cloze-driven Pretraining of Self-attention Networks.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019