Yixuan Zhou

Orcid: 0009-0002-6363-891X

Affiliations:
  • Tsinghua University, Shenzhen International Graduate School, Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, Shenzhen, China


According to our database1, Yixuan Zhou authored at least 17 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
SongCreator: Lyrics-based Universal Song Generation.
CoRR, 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024

Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.
CoRR, 2021

Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.
Proceedings of the IEEE International Conference on Acoustics, 2021


  Loading...