Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, 2024
Harmony: Heterogeneous Multi-Modal Federated Learning through Disentangled Model Training.
Proceedings of the 21st Annual International Conference on Mobile Systems, 2023
ASR-Free Pronunciation Assessment.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
CN-Celeb: A Challenging Chinese Speaker Recognition Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020