2025
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition.
CoRR, January, 2025
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models.
CoRR, January, 2025
2024
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
,
,
,
,
,
,
,
,
,
,
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models.
CoRR, 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC.
CoRR, 2024
Exploring SSL Discrete Tokens for Multilingual ASR.
CoRR, 2024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
CoRR, 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions.
CoRR, 2024
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.
CoRR, 2024
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation.
CoRR, 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model.
CoRR, 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition.
CoRR, 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking With Self-Supervised Learning Features.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024
Enhancing Pre-Trained ASR System Fine-Tuning for Dysarthric Speech Recognition Using Adversarial Data Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards Automatic Data Augmentation for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
2023
Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Unsupervised Model-Based Speaker Adaptation of End-To-End Lattice-Free MMI Model for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
2022
Bayesian Neural Network Language Modeling for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
On the similarities of representations in artificial and brain neural networks for speech recognition.
Frontiers Comput. Neurosci., 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus.
CoRR, 2022
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
CoRR, 2022
On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
CoRR, 2022
Investigation of Deep Neural Network Acoustic Modelling Approaches for Low Resource Accented Mandarin Speech Recognition.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Context-aware Multimodal Fusion for Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2022
Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Neural Architecture Search for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Bayesian Learning for Deep Neural Network Adaptation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Speech Emotion Recognition Using Sequential Capsule Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Exemplar-Based Emotive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Recent Progress in the CUHK Dysarthric Speech Recognition System.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition.
CoRR, 2021
Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Adversarial Data Augmentation for Disordered Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Conference on Acoustics, 2021
Bayesian Transformer Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Mixed Precision Quantization of Transformer Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021
Replay and Synthetic Speech Detection with Res2Net Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2021
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2021
Understanding the wiring evolution in differentiable neural architecture search.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021
2020
Cross-Domain Deep Visual Feature Generation for Mandarin Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Neural Architecture Search for Speech Recognition.
CoRR, 2020
Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020
Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Speaker-Aware Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Audio-Visual Multi-Channel Recognition of Overlapped Speech.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Transferring Source Style in Non-Parallel Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
End-To-End Accent Conversion Without Using Native Utterances.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
DSNAS: Direct Neural Architecture Search Without Parameter Retraining.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
On the Use of Pitch Features for Disordered Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
The CUHK Dysarthric Speech Recognition Systems for English and Cantonese.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Recurrent Neural Network Language Model Training Using Natural Gradient.
Proceedings of the IEEE International Conference on Acoustics, 2019
BLHUC: Bayesian Learning of Hidden Unit Contributions for Deep Neural Network Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019
Speech Emotion Recognition Using Capsule Networks.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Conference on Acoustics, 2019
CNN-RNN-CTC Based End-to-end Mispronunciation Detection and Diagnosis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
End-to-end Code-switched TTS with Mix of Monolingual Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
The HCCL-CUHK System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Investigation of Stacked Deep Neural Networks and Mixture Density Networks for Acoustic-to-Articulatory Inversion.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Gaussian Process Neural Networks for Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Feature Based Adaptation for Speaking Style Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Drawing-Based Automatic Dementia Screening Using Gaussian Process Markov Chains.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018
2017
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem.
PLoS Comput. Biol., 2017
Future Word Contexts in Neural Network Language Models.
CoRR, 2017
RNN-LDA Clustering for Feature Based DNN Adaptation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Investigating Bidirectional Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Recurrent neural network language models for keyword search.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multimodal learning using 3D audio-visual data for audio-visual speech recognition.
Proceedings of the 2017 International Conference on Asian Language Processing, 2017
2016
Two Efficient Lattice Rescoring Methods Using Recurrent Neural Network Language Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Efficient Training and Evaluation of Recurrent Neural Network Language Models for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Deep Neural Network Based Acoustic-to-Articulatory Inversion Using Phone Sequence Information.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Convolutional neural network bottleneck features for bi-directional generalized variable parameter HMMs.
Proceedings of the IEEE International Conference on Information and Automation, 2016
Improved DNN-based segmentation for multi-genre broadcast audio.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
CUED-RNNLM - An open-source toolkit for efficient training and evaluation of recurrent neural network language models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Automatic Complexity Control of Generalized Variable Parameter HMMs for Noise Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Generalized variable parameter HMMs based acoustic-to-articulatory inversion.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Efficient use of DNN bottleneck features in generalized variable parameter HMMs for noise robust speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
The Cambridge University 2014 BOLT conversational telephone Mandarin Chinese LVCSR system for speech translation.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Recurrent neural network language model adaptation for multi-genre broadcast speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Investigations of low resource multi-accent mandarin speech recognition.
Proceedings of the IEEE International Conference on Information and Automation, 2015
Paraphrastic recurrent neural network language models.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Recurrent neural network language model training with noise contrastive estimation for speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Improving the training and evaluation efficiency of recurrent neural network language models.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Cambridge university transcription systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
The development of the cambridge university alignment systems for the multi-genre broadcast challenge.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Speaker diarisation and longitudinal linking in multi-genre broadcast data.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Investigation of back-off based interpolation between recurrent neural network and n-gram language models.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
The MGB challenge: Evaluating multi-genre broadcast media recognition.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
Paraphrastic language models.
Comput. Speech Lang., 2014
Deep neural network bottleneck features for generalized variable parameter HMMs.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Efficient lattice rescoring using recurrent neural network language models.
Proceedings of the IEEE International Conference on Acoustics, 2014
Paraphrastic neural network language models.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Language model cross adaptation for LVCSR system combination.
Comput. Speech Lang., 2013
Use of contexts in language model interpolation and adaptation.
Comput. Speech Lang., 2013
Improving lightly supervised training for broadcast transcription.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Cross-domain paraphrasing for improving language modelling using out-of-domain data.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Feature space generalized variable parameter HMMs for noise robust recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Automatic Transcription of Multi-genre Media Archives.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the First Workshop on Speech, 2013
Paraphrastic language models and combination with neural network language models.
Proceedings of the IEEE International Conference on Acoustics, 2013
Automatic model complexity control for generalized variable parameter HMMs.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013
2012
Transcription of multi-genre media archives using out-of-domain data.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Structured modeling based on generalized variable parameter HMMs and speaker adaptation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
2011
A flexible framework for HMM based noise robust speech recognition using generalized parametric space polynomial regression.
Sci. China Inf. Sci., 2011
Improving LVCSR System Combination Using Neural Network Language Model Cross Adaptation.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Generalized Variable Parameter HMMs for Noise Robust Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Investigation of acoustic units for LVCSR systems.
Proceedings of the IEEE International Conference on Acoustics, 2011
2010
Improved neural network based language modelling and adaptation.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Language model combination and adaptation usingweighted finite state transducers.
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
Exploiting Chinese character models to improve speech recognition performance.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
2008
Context dependent language model adaptation.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
2007
Automatic Model Complexity Control Using Marginalized Discriminative Growth Functions.
IEEE Trans. Speech Audio Process., 2007
Improving Speech Transcription for Mandarin-English Translation.
Proceedings of the IEEE International Conference on Acoustics, 2007
Speech Recognition System Combination for Machine Translation.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Conference on Acoustics, 2007
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
Corrections to "Automatic Transcription of Conversational Telephone Speech".
IEEE Trans. Speech Audio Process., 2006
The Cu-Htk Mandarin Broadcast News Transcription System.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
Automatic transcription of conversational telephone speech.
IEEE Trans. Speech Audio Process., 2005
Investigation of Acoustic Modeling Techniques for LVCSR Systems.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
2004
Model complexity control and compression using discriminative growth functions.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Development of the 2003 CU-HTK conversational telephone speech transcription system.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
Automatic complexity control for HLDA systems.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003