Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Decoupled structure for improved adaptability of end-to-end models.
Speech Commun., 2024
Transducer-Llama: Integrating LLMs into Streamable Transducer-based Speech Recognition.
CoRR, 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.
CoRR, 2024
CoT-ST: Enhancing LLM-based Speech Translation with Multimodal Chain-of-Thought.
CoRR, 2024
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning.
CoRR, 2024
FastInject: Injecting Unpaired Text Data into CTC-Based ASR Training.
Proceedings of the IEEE International Conference on Acoustics, 2024
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Adaptable End-to-End ASR Models Using Replaceable Internal LMs and Residual Softmax.
Proceedings of the IEEE International Conference on Acoustics, 2023
Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
ETEH: Unified Attention-Based End-to-End ASR and KWS Architecture.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model.
CoRR, 2021
Improving Accent Identification and Accented Speech Recognition Under a Framework of Self-Supervised Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
History Utterance Embedding Transformer LM for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving Hybrid CTC/Attention End-to-End Speech Recognition with Pretrained Acoustic and Language Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021