2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
CoRR, April, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.
CoRR, January, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
CoRR, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.
CoRR, 2024

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition.
CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.
Proceedings of the IEEE International Conference on Acoustics, 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
CoRR, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
CoRR, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021
Extremely Low Footprint End-to-End ASR System for Smart Device.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model.
CoRR, 2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019
An Effective Deep Embedding Learning Architecture for Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018