EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model.
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
OmniCam: Unified Multimodal Video Generation via Camera Control.
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
Astrea: A MOE-based Visual Understanding Model with Progressive Alignment.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration.
,
,
,
,
,
,
,
,
,
,
Proceedings of the ACM on Web Conference 2025, 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
VoxDialogue: Can Spoken Dialogue Systems Understand Information Beyond Words?
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation.
Proceedings of the 31st International Conference on Computational Linguistics, 2025
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
WavChat: A Survey of Spoken Dialogue Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
AudioVSR: Enhancing Video Speech Recognition with Audio Data.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024