Jinyu Li

Orcid: 0000-0002-1089-9748

Affiliations:
  • Microsoft Corporation, Redmond, WA, USA
  • Georgia Institute of Technology, Center for Signal and Image Processing, Atlanta, GA, USA (PhD)
  • University of Science and Technology of China, iFlytek Speech Lab, Hefei, China


According to our database1, Jinyu Li authored at least 234 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.
IEEE Trans. Multim., 2024

SpeechLM: Enhanced Speech Pre-Training With Unpaired Textual Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

VioLA: Conditional Language Models for Speech Recognition, Synthesis, and Translation.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Advanced Long-Content Speech Recognition With Factorized Neural Transducer.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation.
CoRR, 2024

Target word activity detector: An approach to obtain ASR word boundaries without lexicon.
CoRR, 2024

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation.
CoRR, 2024

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech.
CoRR, 2024

Autoregressive Speech Synthesis without Vector Quantization.
CoRR, 2024

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation.
CoRR, 2024

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment.
CoRR, 2024

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS.
CoRR, 2024

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers.
CoRR, 2024

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation.
CoRR, 2024

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations.
CoRR, 2024

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis.
CoRR, 2024

WavLLM: Towards Robust and Adaptive Speech Large Language Model.
CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
CoRR, 2024

Boosting Large Language Model for Speech Synthesis: An Empirical Study.
CoRR, 2024

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Diarist: Streaming Speech Translation with Speaker Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability.
Proceedings of the IEEE International Conference on Acoustics, 2024

Residualtransformer: Residual Low-Rank Learning With Weight-Sharing For Transformer Layers.
Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

WavLLM: Towards Robust and Adaptive Speech Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.
CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.
CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.
CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.
CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.
CoRR, 2023

LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accelerating Transducers through Adjacent Token Merging.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker Change Detection For Transformer Transducer ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.
Proceedings of the IEEE International Conference on Acoustics, 2023

CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.
Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
IEEE J. Sel. Top. Signal Process., 2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.
CoRR, 2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.
CoRR, 2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding.
CoRR, 2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task.
CoRR, 2022

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Configurable Multilingual Model is All You Need to Recognize All Languages.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2022

Factorized Neural Transducer for Efficient Language Model Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Speaker Separation Using Speaker Inventories and Estimated Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Streaming End-to-End Multi-Talker Speech Recognition.
IEEE Signal Process. Lett., 2021

Self-Supervised Learning for speech recognition with Intermediate layer supervision.
CoRR, 2021

Recent Advances in End-to-End Automatic Speech Recognition.
CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.
CoRR, 2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.
CoRR, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Ensemble Combination between Different Time Segmentations.
Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.
Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2021

On Addressing Practical Challenges for RNN-Transducer.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.
CoRR, 2020

Adaptation Algorithms for Speech Recognition: An Overview.
CoRR, 2020

Continuous Speech Separation with Conformer.
CoRR, 2020

Continuous speech separation: dataset and analysis.
CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Combination of End-to-End and Hybrid Models for Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring Transformers for Large-Scale Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Sequence-Level Self-Learning with Multiple Hypotheses.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency End-to-End Streaming Speech Recognition with a Scout Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semantic Mask for Transformer Based End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

L-Vector: Neural Label Embedding for Domain Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Layer Trajectory BLSTM.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic-to-Phrase Models for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Adversarial Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Conditional Teacher-student Learning.
Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Adversarial Learning for Domain-invariant Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Code-switching ASR for End-to-end CTC Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Layer Trajectory LSTM with Future Context Frames.
Proceedings of the IEEE International Conference on Acoustics, 2019

Universal Acoustic Modeling Using Neural Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2019

CNN with Phonetic Attention for Text-Independent Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speech Separation Using Speaker Inventory.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Character-Aware Attention-Based End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Recent Progresses in Deep Learning based Acoustic Models (Updated).
CoRR, 2018

Speaker-Invariant Training via Adversarial Learning.
CoRR, 2018

Speaker Adaptation for End-to-End CTC Models.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Exploring Layer Trajectory LSTM with Depth Processing Units and Attention.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Adversarial Feature-Mapping for Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Cycle-Consistent Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Layer Trajectory LSTM.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improved Training for Online End-to-end Speech Recognition Systems.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Domain and Speaker Adaptation for Cortana Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Advancing Acoustic-to-Word CTC Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Advancing Connectionist Temporal Classification with Attention Modeling.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Recent progresses in deep learning based acoustic models.
IEEE CAA J. Autom. Sinica, 2017

Large-Scale Domain Adaptation via Teacher-Student Learning.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Extended low-rank plus diagonal adaptation for deep and recurrent neural networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Acoustic-to-word model without OOV.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Challenges in and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016
Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

End-to-End attention based text-dependent speaker verification.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Low-rank plus diagonal adaptation for deep neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Recurrent support vector machines for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Simplifying long short-term memory acoustic models for fast training and decoding.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exploring multidimensional lstms for large vocabulary ASR.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
SVD-based universal DNN modeling for multiple scenarios.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Maximum a posteriori adaptation of network parameters in deep models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Rapid adaptation for deep neural networks through multi-task learning.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Small-footprint high-performance deep neural network-based speech recognition using split-VQ.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An analysis of convolutional neural networks for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

LSTM time and frequency recurrence for automatic speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
An Overview of Noise-Robust Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Variable-activation and variable-input deep neural network for robust speech recognition.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Variable-component deep neural network for robust speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Learning small-size DNN with output-distribution-based criteria.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Feature space maximum a posteriori linear regression for adaptation of deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network.
Proceedings of the IEEE International Conference on Acoustics, 2014

Feature compensation using linear combination of speaker and environment dependent correction vectors.
Proceedings of the IEEE International Conference on Acoustics, 2014

Investigation of maxout networks for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Factorized adaptation for deep neural network.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.
IEEE Trans. Speech Audio Process., 2013

Model-based margin estimation for hidden Markov model learning and generalisation.
IET Signal Process., 2013

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks
Proceedings of the 1st International Conference on Learning Representations, 2013

Restructuring of deep neural network acoustic models with singular value decomposition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Investigations on hessian-free optimization for cross-entropy training of deep neural networks.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.
Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Efficient VTS Adaptation Using Jacobian Approximation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Lasso environment model combination for robust speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improvements to VTS feature enhancement.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011
Calibration of Confidence Measures in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Feature Normalization Using Structured Full Transforms for Robust Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010

Unscented transform with online distortion estimation for HMM adaptation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Shrinkage model adaptation in automatic speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions.
Comput. Speech Lang., 2009

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009

A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
Proceedings of the IEEE International Conference on Acoustics, 2009

A study on hidden Markov model's generalization capability for speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Soft margin estimation for automatic speech recognition.
PhD thesis, 2008

Soft margin estimation with various separation levels for LVCSR.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

On a generalization of margin-based discriminative training to robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Adaptation of compressed HMM parameters for resource-constrained speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Approximate Test Risk Bound Minimization Through Soft Margin Estimation.
IEEE Trans. Speech Audio Process., 2007

Soft margin feature extraction for automatic speech recognition.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Detection-based ASR in the automatic speech attribute transcription project.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Approximate Test Risk Minimization Through Soft Margin Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2007

A study on soft margin estimation for LVCSR.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

A study on lattice rescoring with knowledge scores for automatic speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Soft margin estimation of hidden Markov model parameters.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Application of E<i>alpha</i>Nets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition.
Proceedings of the Neural Nets, 16th Italian Workshop on Neural Nets, 2005

A study on separation between acoustic models and its applications.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On designing and evaluating speech event detectors.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Double Gaussian based feature normalization for robust speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A complexity reduction of ETSI advanced front-end for DSR.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Dimensionality reduction using MCE-optimized LDA transformation.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2000
A novel search algorithm for LSF VQ.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000


  Loading...