Jinyu Li
Orcid: 0000-0002-1089-9748Affiliations:
- Microsoft Corporation, Redmond, WA, USA
- Georgia Institute of Technology, Center for Signal and Image Processing, Atlanta, GA, USA (PhD)
- University of Science and Technology of China, iFlytek Speech Lab, Hefei, China
According to our database1,
Jinyu Li
authored at least 234 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
On csauthors.net:
Bibliography
2024
VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.
IEEE Trans. Multim., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation.
CoRR, 2024
Target word activity detector: An approach to obtain ASR word boundaries without lexicon.
CoRR, 2024
CoRR, 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech.
CoRR, 2024
Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation.
CoRR, 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment.
CoRR, 2024
CoRR, 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR, 2024
CoRR, 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR, 2024
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Residualtransformer: Residual Low-Rank Learning With Weight-Sharing For Transformer Layers.
Proceedings of the IEEE International Conference on Acoustics, 2024
Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
2023
VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023
LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
2022
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE J. Sel. Top. Signal Process., 2022
Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.
CoRR, 2022
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.
CoRR, 2022
CoRR, 2022
CoRR, 2022
Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2022
Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE Signal Process. Lett., 2021
CoRR, 2021
CoRR, 2021
CoRR, 2021
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
2020
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
IEEE CAA J. Autom. Sinica, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Unsupervised adaptation with domain separation networks for robust speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Challenges in and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
2016
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Small-footprint high-performance deep neural network-based speech recognition using split-VQ.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
IEEE ACM Trans. Audio Speech Lang. Process., 2014
Variable-activation and variable-input deep neural network for robust speech recognition.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Feature space maximum a posteriori linear regression for adaptation of deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network.
Proceedings of the IEEE International Conference on Acoustics, 2014
Feature compensation using linear combination of speaker and environment dependent correction vectors.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.
IEEE Trans. Speech Audio Process., 2013
IET Signal Process., 2013
Proceedings of the 1st International Conference on Learning Representations, 2013
Restructuring of deep neural network acoustic models with singular value decomposition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Investigations on hessian-free optimization for cross-entropy training of deep neural networks.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
IEEE ACM Trans. Audio Speech Lang. Process., 2011
Feature Normalization Using Structured Full Transforms for Robust Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011
2010
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions.
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions.
Comput. Speech Lang., 2009
Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009
A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009
2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
On a generalization of margin-based discriminative training to robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
IEEE Trans. Speech Audio Process., 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
2005
Application of E<i>alpha</i>Nets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition.
Proceedings of the Neural Nets, 16th Italian Workshop on Neural Nets, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000