2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.
CoRR, 2024

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.
CoRR, 2024

Can Large Language Models Understand Spatial Audio?
CoRR, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Connecting Speech Encoder and Large Language Model for ASR.
Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.
CoRR, 2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2021
Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021