Xinfa Zhu

Orcid: 0000-0001-9275-523X

According to our database¹, Xinfa Zhu authored at least 27 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of five.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions.

[BibT_eX]

[DOI]

CoRR, January, 2025

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia.

[BibT_eX]

[DOI]

CoRR, January, 2025

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

METTS: Multilingual Emotional Text-to-Speech by Cross-Speaker and Cross-Lingual Emotion Transfer.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

U-Style: Cascading U-Nets With Multi-Level Speaker and Style Modeling for Zero-Shot Voice Cloning.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Autoregressive Speech Synthesis with Next-Distribution Prediction.

[BibT_eX]

[DOI]

Xinfa Zhu

Wenjie Tian

Lei Xie

CoRR, 2024

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls.

[BibT_eX]

[DOI]

CoRR, 2024

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.

[BibT_eX]

[DOI]

CoRR, 2024

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy.

[BibT_eX]

[DOI]

CoRR, 2024

UniStyle: Unified Style Modeling for Speaking Style Captioning and Stylistic Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Contrastive Context-Speech Pretraining for Expressive Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Chinese Spoken Language Processing, 2024

Boosting Multi-Speaker Expressive Speech Synthesis with Semi-Supervised Contrastive Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

SELM: Speech Enhancement using Discrete Tokens and Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Spontts: Modeling and Transferring Spontaneous Style for TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

DiCLET-TTS: Diffusion Model Based Cross-Lingual Emotion Transfer for Text-to-Speech - A Study Between English and Mandarin.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Accent-VITS: accent transfer for end-to-end TTS.

[BibT_eX]

[DOI]

CoRR, 2023

SponTTS: modeling and transferring spontaneous style for TTS.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Vec-Tok Speech: speech vectorization and tokenization for neural speech generation.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Cross-Speaker Emotion Transfer Through Information Perturbation in Emotional Speech Synthesis.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Xinfa Zhu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...