2025

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.

[DOI]

Guanrou Yang

Chen Yang

CoRR, April, 2025

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.

[DOI]

CoRR, March, 2025

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction.

[DOI]

CoRR, January, 2025

Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.

[DOI]

CoRR, 2024

Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.

[DOI]

CoRR, 2024

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition.

[DOI]

CoRR, 2024

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.

[DOI]

CoRR, 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.

[DOI]

CoRR, 2024

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.

[DOI]

CoRR, 2024

CTC-Assisted LLM-Based Contextual ASR.

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2024

MaLa-ASR: Multimedia-Assisted LLM-Based ASR.

[DOI]

Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.

[DOI]

CoRR, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[DOI]

CoRR, 2023

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

FunASR: A Fundamental End-to-End Speech Recognition Toolkit.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Extremely Low Footprint End-to-End ASR System for Smart Device.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model.

[DOI]

CoRR, 2020

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

2019

An Effective Deep Embedding Learning Architecture for Speaker Verification.

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018