Yinghao Aaron Li

IEEE J. Sel. Top. Signal Process., January, 2025

2024

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis.

[BibT_eX]

[DOI]

Xilin Jiang

Adrian Nicolas Florea

CoRR, 2024

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience.

[BibT_eX]

[DOI]

CoRR, 2024

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform.

[BibT_eX]

[DOI]

CoRR, 2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes.

[BibT_eX]

[DOI]

Xilin Jiang

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

2022

Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

2021

StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.

[BibT_eX]

[DOI]

Ali Zare