SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.
CoRR, 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Can Large Language Models Understand Spatial Audio?
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SALMONN: Towards Generic Hearing Abilities for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Connecting Speech Encoder and Large Language Model for ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024
Extending Large Language Models for Speech and Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2024
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023
Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021