Zhuo Chen

Yashesh Gaur

Jinyu Li

Proceedings of the IEEE International Conference on Acoustics, 2022

Endpoint Detection for Streaming End-to-End Multi-Talker ASR.

[DOI]

Sarangarajan Parthasarathy

Jinyu Li

Yifan Gong

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Streaming End-to-End Multi-Talker Speech Recognition.

[DOI]

IEEE Signal Process. Lett., 2021

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.

[DOI]

Zhong Meng

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Sequence-Level Self-Teaching Regularization.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.

[DOI]

Zhong Meng

Naoyuki Kanda

Yashesh Gaur

Sarangarajan Parthasarathy

Proceedings of the IEEE International Conference on Acoustics, 2021

Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Continuous speech separation: dataset and analysis.

[DOI]

CoRR, 2020

Combination of End-to-End and Hybrid Models for Speech Recognition.

[DOI]

Jeremy Heng Meng Wong

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring Transformers for Large-Scale Speech Recognition.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Low Latency End-to-End Streaming Speech Recognition with a Scout Network.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semantic Mask for Transformer Based End-to-End Speech Recognition.

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Continuous Speech Separation: Dataset and Analysis.

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch.

[DOI]

CoRR, 2019

Self-Teaching Networks.

[DOI]

Eric Sun

Yifan Gong

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Layer Trajectory LSTM with Future Context Frames.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Separation Using Speaker Inventory.

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Exploring Layer Trajectory LSTM with Depth Processing Units and Attention.

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

A Study of All-Convolutional Encoders for Connectionist Temporal Classification.

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Small-Footprint Highway Deep Neural Networks for Speech Recognition.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

End-to-End Neural Segmental Models for Speech Recognition.

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Multi-task Learning with CTC and Segmental CRF for Speech Recognition.

[DOI]

CoRR, 2017

Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multitask Learning with CTC and Segmental CRF for Speech Recognition.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multiplicative LSTM for sequence modelling.

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Knowledge distillation for small-footprint highway networks.

[DOI]

Michelle Guo

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Top-down Tree Long Short-Term Memory Networks.

[DOI]

Xingxing Zhang

Mirella Lapata

Proceedings of the NAACL HLT 2016, 2016

Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Segmental Recurrent Neural Networks for End-to-End Speech Recognition.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Deep beamforming networks for multi-channel speech recognition.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker-aware training of LSTM-RNNS for acoustic modelling.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition.

[DOI]

Xingxing Zhang

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Tree Recurrent Neural Networks with Application to Language Modeling.

[DOI]

Xingxing Zhang

Mirella Lapata

CoRR, 2015

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition.

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models.

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multi-frame factorisation for long-span acoustic modelling.

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2014

Probabilistic Linear Discriminant Analysis for Acoustic Modeling.

[DOI]

IEEE Signal Process. Lett., 2014

Tied Probabilistic Linear Discriminant Analysis for Speech Recognition.

[DOI]

CoRR, 2014

Probabilistic linear discriminant analysis with bottleneck features for speech recognition.

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Subspace Gaussian mixture models for automatic speech recognition

[DOI]

PhD thesis, 2013

Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models.

[DOI]

IEEE Trans. Speech Audio Process., 2013

Noise adaptive training for subspace Gaussian mixture models.

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition.

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Joint uncertainty decoding with unscented transform for noise robust subspace Gaussian mixture models.

[DOI]

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Noise Compensation for Subspace Gaussian Mixture Models.

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition.

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Regularized Subspace Gaussian Mixture Models for Speech Recognition.

[DOI]

IEEE Signal Process. Lett., 2011

Regularized subspace Gaussian mixture models for cross-lingual speech recognition.

[DOI]