EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data.
CoRR, 2024
Exploring SSL Discrete Tokens for Multilingual ASR.
CoRR, 2024
ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis.
CoRR, 2024
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue.
CoRR, 2022
CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Environment Aware Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.
Proceedings of the IEEE International Conference on Acoustics, 2022
CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge.
CoRR, 2021
Applying the Information Bottleneck Principle to Prosodic Representation Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement.
CoRR, 2020