Yixuan Zhou
Orcid: 0009-0002-6363-891XAffiliations:
- Tsinghua University, Shenzhen International Graduate School, Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen, China
According to our database1,
Yixuan Zhou
authored at least 17 papers
between 2021 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.
CoRR, 2021
Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.
Proceedings of the IEEE International Conference on Acoustics, 2021