2022
Continuous Streaming Multi-Talker ASR with Dual-Path Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2022
Endpoint Detection for Streaming End-to-End Multi-Talker ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Streaming End-to-End Multi-Talker Speech Recognition.
IEEE Signal Process. Lett., 2021
Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Sequence-Level Self-Teaching Regularization.
Proceedings of the IEEE International Conference on Acoustics, 2021
Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Continuous speech separation: dataset and analysis.
CoRR, 2020
Combination of End-to-End and Hybrid Models for Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Exploring Transformers for Large-Scale Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Low Latency End-to-End Streaming Speech Recognition with a Scout Network.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Semantic Mask for Transformer Based End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Exploring Pre-Training with Alignments for RNN Transducer Based End-to-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Continuous Speech Separation: Dataset and Analysis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch.
CoRR, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Improving Layer Trajectory LSTM with Future Context Frames.
Proceedings of the IEEE International Conference on Acoustics, 2019
Speech Separation Using Speaker Inventory.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
Exploring Layer Trajectory LSTM with Depth Processing Units and Attention.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
A Study of All-Convolutional Encoders for Connectionist Temporal Classification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Small-Footprint Highway Deep Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
End-to-End Neural Segmental Models for Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017
Multi-task Learning with CTC and Segmental CRF for Speech Recognition.
CoRR, 2017
Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multitask Learning with CTC and Segmental CRF for Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multiplicative LSTM for sequence modelling.
Proceedings of the 5th International Conference on Learning Representations, 2017
Knowledge distillation for small-footprint highway networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
2016
Top-down Tree Long Short-Term Memory Networks.
Proceedings of the NAACL HLT 2016, 2016
Small-Footprint Deep Neural Networks with Highway Connections for Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Segmental Recurrent Neural Networks for End-to-End Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Deep beamforming networks for multi-channel speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Speaker-aware training of LSTM-RNNS for acoustic modelling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Tree Recurrent Neural Networks with Application to Language Modeling.
CoRR, 2015
A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Multi-frame factorisation for long-span acoustic modelling.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
2014
Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014
Probabilistic Linear Discriminant Analysis for Acoustic Modeling.
IEEE Signal Process. Lett., 2014
Tied Probabilistic Linear Discriminant Analysis for Speech Recognition.
CoRR, 2014
Probabilistic linear discriminant analysis with bottleneck features for speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
2013
Subspace Gaussian mixture models for automatic speech recognition
PhD thesis, 2013
Joint Uncertainty Decoding for Noise Robust Subspace Gaussian Mixture Models.
IEEE Trans. Speech Audio Process., 2013
Noise adaptive training for subspace Gaussian mixture models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013
2012
Joint uncertainty decoding with unscented transform for noise robust subspace Gaussian mixture models.
Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012
Noise Compensation for Subspace Gaussian Mixture Models.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Regularized Subspace Gaussian Mixture Models for Speech Recognition.
IEEE Signal Process. Lett., 2011
Regularized subspace Gaussian mixture models for cross-lingual speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011
2010
Maximum negentropy beamforming with superdirectivity.
Proceedings of the 18th European Signal Processing Conference, 2010