Xinfa Zhu

Orcid: 0000-0001-9275-523X

According to our database1, Xinfa Zhu authored at least 23 papers between 2022 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

2022
2023
2024
0
5
10
7
5
2
5
3
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Autoregressive Speech Synthesis with Next-Distribution Prediction.
CoRR, 2024

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.
CoRR, 2024

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion.
CoRR, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.
CoRR, 2024

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.
CoRR, 2024

UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-Supervised Contrastive Learning.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

SELM: Speech Enhancement using Discrete Tokens and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Spontts: Modeling and Transferring Spontaneous Style for TTS.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Accent-VITS: accent transfer for end-to-end TTS.
CoRR, 2023

SponTTS: modeling and transferring spontaneous style for TTS.
CoRR, 2023

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning.
CoRR, 2023

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation.
CoRR, 2023

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling.
Proceedings of the IEEE International Conference on Acoustics, 2023

Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.
IEEE Signal Process. Lett., 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.
CoRR, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022


  Loading...