Jinyu Li

Proceedings of the IEEE International Conference on Acoustics, 2024

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

WavLLM: Towards Robust and Adaptive Speech Large Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

2023

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning.

[BibT_eX]

[DOI]

CoRR, 2023

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation.

[BibT_eX]

[DOI]

CoRR, 2023

Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers.

[BibT_eX]

[DOI]

CoRR, 2023

LAMASSU: A Streaming Language-Agnostic Multilingual Speech Recognition and Translation Model Using Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Accelerating Transducers through Adjacent Token Merging.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fast and Accurate Factorized Neural Transducer for Text Adaption of End-to-End Speech Recognition Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker Change Detection For Transformer Transducer ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

LongFNT: Long-Form Speech Recognition with Factorized Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CTCBERT: Advancing Hidden-Unit Bert with CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Separation with Large-Scale Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

On Decoder-Only Architecture For Speech-to-Text and Large Language Model Integration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Building High-Accuracy Multilingual ASR With Gated Language Experts and Curriculum Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Improving Stability in Simultaneous Speech Translation: A Revision-Controllable Decoding Approach.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Towards Contextual Spelling Correction for Customization of End-to-End Speech Recognition Systems.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers.

[BibT_eX]

[DOI]

CoRR, 2022

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding.

[BibT_eX]

[DOI]

CoRR, 2022

The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task.

[BibT_eX]

[DOI]

CoRR, 2022

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Separating Long-Form Speech with Group-wise Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Configurable Multilingual Model is All You Need to Recognize All Languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Speech Separation with Recurrent Selective Attention Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Have Best of Both Worlds: Two-Pass Hybrid and E2E Cascading Framework for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Self-Supervised Learning for Speech Recognition with Intermediate Layer Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Wav2vec-Switch: Contrastive Learning from Original-Noisy Speech Pairs for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Endpoint Detection for Streaming End-to-End Multi-Talker ASR.

[BibT_eX]

[DOI]

Liang Lu

Proceedings of the IEEE International Conference on Acoustics, 2022

Unispeech-Sat: Universal Speech Representation Learning With Speaker Aware Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Factorized Neural Transducer for Efficient Language Model Adaptation.

[BibT_eX]

[DOI]

Xie Chen

Proceedings of the IEEE International Conference on Acoustics, 2022

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

Speaker Separation Using Speaker Inventories and Estimated Speech.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Streaming End-to-End Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Self-Supervised Learning for speech recognition with Intermediate layer supervision.

[BibT_eX]

[DOI]

CoRR, 2021

Recent Advances in End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Multilingual Transformer Transducer Models by Reducing Language Confusions.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra Fast Speech Separation Model with Teacher Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Ensemble Combination between Different Time Segmentations.

[BibT_eX]

[DOI]

Jeremy Heng Meng Wong

Dimitrios Dimitriadis

Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Naoyuki Kanda

Yashesh Gaur

Proceedings of the IEEE International Conference on Acoustics, 2021

Continuous Speech Separation with Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Developing Real-Time Streaming Transformer Transducer for Speech Recognition on Large-Scale Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

On Addressing Practical Challenges for RNN-Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

Adaptation Algorithms for Speech Recognition: An Overview.

[BibT_eX]

[DOI]

CoRR, 2020

Continuous Speech Separation with Conformer.

[BibT_eX]

[DOI]

CoRR, 2020

Continuous speech separation: dataset and analysis.

[BibT_eX]

[DOI]

CoRR, 2020

An End-to-End Architecture of Online Multi-Channel Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Combination of End-to-End and Hybrid Models for Speech Recognition.

[BibT_eX]

[DOI]

Jeremy Heng Meng Wong

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring Transformers for Large-Scale Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

On the Comparison of Popular End-to-End Models for Large Scale Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Sequence-Level Self-Learning with Multiple Hypotheses.

[BibT_eX]

[DOI]

Ken'ichi Kumatani

Dimitrios Dimitriadis

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transfer Learning Approaches for Streaming End-to-End Speech Recognition System.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Rapid RNN-T Adaptation Using Personalized Speech Synthesis and Neural Language Generator.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency End-to-End Streaming Speech Recognition with a Scout Network.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semantic Mask for Transformer Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

L-Vector: Neural Label Embedding for Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model.

[BibT_eX]

[DOI]

Rui Zhao

Eric Sun

Jeremy Heng Meng Wong

Amit Das

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Using Personalized Speech Synthesis and Neural Language Generator for Rapid Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Advancing Acoustic-to-Word CTC Model With Attention and Mixed-Units.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Layer Trajectory BLSTM.

[BibT_eX]

[DOI]

Eric Sun

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Adaptation for Attention-Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Acoustic-to-Phrase Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Adversarial Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Conditional Teacher-student Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Attentive Adversarial Learning for Domain-invariant Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Towards Code-switching ASR for End-to-end CTC Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Layer Trajectory LSTM with Future Context Frames.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Universal Acoustic Modeling Using Neural Mixture Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

CNN with Phonetic Attention for Text-Independent Speaker Verification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Speech Separation Using Speaker Inventory.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Character-Aware Attention-Based End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving RNN Transducer Modeling for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Progressive Joint Modeling in Unsupervised Single-Channel Overlapped Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2018

Recent Progresses in Deep Learning based Acoustic Models (Updated).

[BibT_eX]

[DOI]

Dong Yu

CoRR, 2018

Speaker-Invariant Training via Adversarial Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Speaker Adaptation for End-to-End CTC Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Exploring Layer Trajectory LSTM with Depth Processing Units and Attention.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Adversarial Feature-Mapping for Speech Enhancement.

[BibT_eX]

[DOI]

Biing-Hwang Fred Juang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Cycle-Consistent Speech Enhancement.

[BibT_eX]

[DOI]

Biing-Hwang Fred Juang

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Layer Trajectory LSTM.

[BibT_eX]

[DOI]

Changliang Liu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Improved Training for Online End-to-end Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Domain and Speaker Adaptation for Cortana Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker-Invariant Training Via Adversarial Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Developing Far-Field Speaker System Via Teacher-Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Advancing Acoustic-to-Word CTC Model.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Advancing Connectionist Temporal Classification with Attention Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Recent progresses in deep learning based acoustic models.

[BibT_eX]

[DOI]

Dong Yu

IEEE CAA J. Autom. Sinica, 2017

Large-Scale Domain Adaptation via Teacher-Student Learning.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Mask Learning Based Speech Enhancement System with Restoration Layers and Residual Connection.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Extended low-rank plus diagonal adaptation for deep and recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition.

[BibT_eX]

[DOI]

Yan Huang

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised adaptation with domain separation networks for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Acoustic-to-word model without OOV.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Cracking the cocktail party problem by multi-beam deep attractor network.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Challenges in and Solutions to Deep Learning Network Acoustic Modeling in Speech Recognition Products at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Learning Hidden Unit Contributions for Unsupervised Acoustic Model Adaptation.

[BibT_eX]

[DOI]

Pawel Swietojanski

Steve Renals

IEEE ACM Trans. Audio Speech Lang. Process., 2016

End-to-End attention based text-dependent speaker verification.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Deep Convolutional Neural Networks with Layer-Wise Context Expansion and Attention.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Low-rank plus diagonal adaptation for deep neural networks.

[BibT_eX]

[DOI]

Yong Zhao

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Recurrent support vector machines for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Simplifying long short-term memory acoustic models for fast training and decoding.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exploring multidimensional lstms for large vocabulary ASR.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

SVD-based universal DNN modeling for multiple scenarios.

[BibT_eX]

[DOI]

Changliang Liu

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Maximum a posteriori adaptation of network parameters in deep models.

[BibT_eX]

[DOI]

Zhen Huang

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Rapid adaptation for deep neural networks through multi-task learning.

[BibT_eX]

[DOI]

Zhen Huang

I-Fan Chen

Ji Wu

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Small-footprint high-performance deep neural network-based speech recognition using split-VQ.

[BibT_eX]

[DOI]

Yongqiang Wang

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

An analysis of convolutional neural networks for speech recognition.

[BibT_eX]

[DOI]

Jui-Ting Huang

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

LSTM time and frequency recurrence for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

An Overview of Noise-Robust Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Variable-activation and variable-input deep neural network for robust speech recognition.

[BibT_eX]

[DOI]

Rui Zhao

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Variable-component deep neural network for robust speech recognition.

[BibT_eX]

[DOI]

Rui Zhao

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Learning small-size DNN with output-distribution-based criteria.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Feature space maximum a posteriori linear regression for adaptation of deep neural networks.

[BibT_eX]

[DOI]

Zhen Huang

I-Fan Chen

Chao Weng

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Feature compensation using linear combination of speaker and environment dependent correction vectors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Investigation of maxout networks for speech recognition.

[BibT_eX]

[DOI]

Pawel Swietojanski

Jui-Ting Huang

Proceedings of the IEEE International Conference on Acoustics, 2014

Factorized adaptation for deep neural network.

[BibT_eX]

[DOI]

Jui-Ting Huang

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Model-based margin estimation for hidden Markov model learning and generalisation.

[BibT_eX]

[DOI]

IET Signal Process., 2013

Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks

[BibT_eX]

[DOI]

Proceedings of the 1st International Conference on Learning Representations, 2013

Restructuring of deep neural network acoustic models with singular value decomposition.

[BibT_eX]

[DOI]

Jian Xue

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Investigations on hessian-free optimization for cross-entropy training of deep neural networks.

[BibT_eX]

[DOI]

Simon Wiesler

Jian Xue

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Recent advances in deep learning for speech research at Microsoft.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Efficient VTS Adaptation Using Jacobian Approximation.

[BibT_eX]

[DOI]

Michael L. Seltzer

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Lasso environment model combination for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improvements to VTS feature enhancement.

[BibT_eX]

[DOI]

Michael L. Seltzer

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Calibration of Confidence Measures in Speech Recognition.

[BibT_eX]

[DOI]

Dong Yu

Li Deng

IEEE ACM Trans. Audio Speech Lang. Process., 2011

Feature Normalization Using Structured Full Transforms for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Maximum likelihood adaptation of histogram equalization with constraint for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

Unscented transform with online distortion estimation for HMM adaptation.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Shrinkage model adaptation in automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Word confidence calibration using a maximum entropy model with constraints on confidence and word distributions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2009

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Universal Communication Symposium, 2009

A study on soft margin estimation of linear regression parameters for speaker adaptation.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Ensemble speaker and speaking environment modeling approach with advanced online estimation process.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

A study on hidden Markov model's generalization capability for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Soft margin estimation for automatic speech recognition.

[BibT_eX]

[DOI]

PhD thesis, 2008

Soft margin estimation with various separation levels for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

On a generalization of margin-based discriminative training to robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Adaptation of compressed HMM parameters for resource-constrained speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Approximate Test Risk Bound Minimization Through Soft Margin Estimation.

[BibT_eX]

[DOI]

Ming Yuan

IEEE Trans. Speech Audio Process., 2007

Soft margin feature extraction for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Detection-based ASR in the automatic speech attribute transcription project.

[BibT_eX]

[DOI]

Antonio Moreno-Daniel

Jeremy Morris

Yu Wang

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Approximate Test Risk Minimization Through Soft Margin Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

A study on soft margin estimation for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.

[BibT_eX]

[DOI]

Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

A study on lattice rescoring with knowledge scores for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Soft margin estimation of hidden Markov model parameters.

[BibT_eX]

[DOI]

Ming Yuan

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

Application of E<i>alpha</i>Nets to Feature Recognition of Articulation Manner in Knowledge-Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the Neural Nets, 16th Italian Workshop on Neural Nets, 2005

A study on separation between acoustic models and its applications.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On designing and evaluating speech event detectors.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Double Gaussian based feature normalization for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A complexity reduction of ETSI advanced front-end for DSR.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Dimensionality reduction using MCE-optimized LDA transformation.

[BibT_eX]

[DOI]

Xiao-Bing Li

Jin-Yu Li

Ren-Hua Wang

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2000

A novel search algorithm for LSF VQ.

[BibT_eX]

[DOI]