2024

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation.

[DOI]

Wenyi Yu

Siyin Wang

CoRR, 2024

Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation.

[DOI]

CoRR, 2024

Can Large Language Models Understand Spatial Audio?

[DOI]

CoRR, 2024

video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models.

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SALMONN: Towards Generic Hearing Abilities for Large Language Models.

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Connecting Speech Encoder and Large Language Model for ASR.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Extending Large Language Models for Speech and Audio Captioning.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models.

[DOI]

CoRR, 2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2021

Emitting Word Timings with HMM-Free End-to-End System in Automatic Speech Recognition.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021