2025

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.

[DOI]

Xinsheng Wang

Mingqi Jiang

CoRR, March, 2025

Audio-FLAN: A Preliminary Release.

[DOI]

CoRR, February, 2025

Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease.

[DOI]

Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024

2023

Harmony: Heterogeneous Multi-Modal Federated Learning through Disentangled Model Training.

[DOI]

Proceedings of the 21st Annual International Conference on Mobile Systems, 2023

2020

ASR-Free Pronunciation Assessment.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020