EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration.
,
,
,
,
,
,
,
,
,
,
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap.
CoRR, 2024
Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition.
CoRR, 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
CTC-Assisted LLM-Based Contextual ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR.
Proceedings of the 25th Annual Conference of the International Speech Communication Association, 2024
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability.
Proceedings of the IEEE International Conference on Acoustics, 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Extremely Low Footprint End-to-End ASR System for Smart Device.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Universal ASR: Unifying Streaming and Non-Streaming ASR Using a Single Encoder-Decoder Model.
CoRR, 2020
Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Effective Deep Embedding Learning Architecture for Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018