2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions.

[DOI]

Kai Chen

Yunhao Gou

CoRR, 2024

Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data.

[DOI]

CoRR, 2024

Exploring SSL Discrete Tokens for Multilingual ASR.

[DOI]

CoRR, 2024

ToneUnit: A Speech Discretization Approach for Tonal Language Speech Synthesis.

[DOI]

CoRR, 2024

2022

Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue.

[DOI]

Daxin Tan

Nikos Kargas

David McHardy

Constantinos Papayiannis

Antonio Bonafonte

Marek Strelec

Jonas Rohnke

Agis Oikonomou-Filandras

Trevor Wood

CoRR, 2022

CorrectSpeech: A Fully Automated System for Speech Correction and Accent Reduction.

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Environment Aware Text-to-Speech Synthesis.

[DOI]

Daxin Tan

Guangyan Zhang

Tan Lee

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study on the Efficacy of Model Pre-Training In Developing Neural Text-to-Speech System.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge.

[DOI]

CoRR, 2021

Applying the Information Bottleneck Principle to Prosodic Representation Learning.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fine-Grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement.

[DOI]

Daxin Tan

Tan Lee

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Fine-grained style modelling and transfer in text-to-speech synthesis via content-style disentanglement.

[DOI]

Daxin Tan

Tan Lee

CoRR, 2020