2025
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing.
CoRR, 2024

2023
Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2020
WaveFlow: A Compact Flow-based Model for Raw Audio.
Proceedings of the 37th International Conference on Machine Learning, 2020

Non-Autoregressive Neural Text-to-Speech.
Proceedings of the 37th International Conference on Machine Learning, 2020

Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019
Multi-Speaker End-to-End Speech Synthesis.
CoRR, 2019

Parallel Neural Text-to-Speech.
CoRR, 2019

ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Neural Voice Cloning with a Few Samples.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Deep Voice 3: 2000-Speaker Neural Text-to-Speech.
CoRR, 2017

Deep Voice 2: Multi-Speaker Neural Text-to-Speech.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017