Khe Chai Sim

Orcid: 0000-0002-0866-2223

According to our database1, Khe Chai Sim authored at least 139 papers between 2004 and 2024.

Collaborative distances:




In proceedings 
PhD thesis 




TransformerFAM: Feedback attention is working memory.
CoRR, 2024

Hierarchical Recurrent Adapters for Efficient Multi-Task Adaptation of Large Speech Models.
CoRR, 2024

Aligner-Encoders: Self-Attention Transformers Can Be Self-Transducers.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Massive End-to-end Speech Recognition Models with Time Reduction.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Speech Recognition for African American English with Audio Classification.
Proceedings of the IEEE International Conference on Acoustics, 2024

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning.
CoRR, 2023

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm.
CoRR, 2023

Massive End-to-end Models for Short Search Queries.
CoRR, 2023

Edit Distance based RL for RNNT decoding.
CoRR, 2023

Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Re-investigating the Efficient Transfer Learning of Speech Foundation Model using Feature Fusion Methods.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Efficient Domain Adaptation for Speech Foundation Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Comparison of Soft and Hard Target RNN-T Distillation for Large-Scale ASR.
Proceedings of the IEEE International Conference on Acoustics, 2023

Resource-Efficient Transfer Learning from Speech Foundation Model Using Hierarchical Feature Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contextual Spelling Correction with Large Language Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2022

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning.
CoRR, 2022

Internal Language Model Personalization of E2E Automatic Speech Recognition Using Random Encoder Features.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Context-Aware Neural Confidence Estimation for Rare Word Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NAM+: Towards Scalable End-to-End Contextual Biasing for Adaptive ASR.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On-the-fly ASR Corrections with Audio Exemplars.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pseudo Label Is Better Than Human Label.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Incremental Layer-Wise Self-Supervised Learning for Efficient Unsupervised Speech Domain Adaptation On Device.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

UserLibri: A Dataset for ASR Personalization Using Only Text.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Large-Scale ASR Domain Adaptation Using Self- and Semi-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Unsupervised and Supervised Training for Multilingual ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022

Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device.
CoRR, 2021

On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech.
CoRR, 2021

Robust Continuous On-Device Personalization for Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Comparison of Supervised and Unsupervised Pre-Training of End-to-End Models.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Low-Rank Gradient Approximation for Memory-Efficient on-Device Training of Deep Neural Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Investigation into On-Device Personalization of End-to-End Automatic Speech Recognition Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving CTC Using Stimulated Learning for Sequence Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

Personalization of End-to-End Speech Recognition on Mobile Devices for Named Entities.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Improving Interpretability and Regularization in Deep Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Toward Domain-Invariant Speech Recognition via Large Scale Training.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Efficient Implementation of Recurrent Neural Network Transducer in Tensorflow.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

learning Effective Factorized Hidden Layer Bases Using Student-Teacher Training for LSTM Acoustic Model Adaptation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Understanding Recurrent Neural State Using Memory Signatures.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model.
CoRR, 2017

An Efficient Phone N-Gram Forward-Backward Computation Using Dense Matrix Multiplication.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning Factorized Transforms for Unsupervised Adaptation of LSTM-RNN Acoustic Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

An investigation into learning effective speaker subspaces for robust unsupervised DNN adaptation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Improving the efficiency of forward-backward algorithm using batched computation in TensorFlow.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Factorized Hidden Layer Adaptation for Deep Neural Network Based Acoustic Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Sensitivity-Characterised Activity Neurogram (SCAN) for Visualising and Understanding the Inner Workings of Deep Neural Network.
IEICE Trans. Inf. Syst., 2016

Learning utterance-level normalisation using Variational Autoencoders for robust automatic speech recognition.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Low-rank bases for factorized hidden layer adaptation of DNN acoustic models.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Entropy-based pruning of hidden units to reduce DNN parameters.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Stimulated Deep Neural Network for Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Multi-Attribute Factorized Hidden Layer Adaptation for DNN Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Microphone Distance Adaptation Using Cluster Adaptive Training for Robust Far Field Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Incorporating a Generative Front-End Layer to Deep Neural Network for Noise Robust Automatic Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Towards implicit complexity control using variable-depth deep neural networks for automatic speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker-aware training of LSTM-RNNS for acoustic modelling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in DNN acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Joint acoustic factor learning for robust deep neural network based automatic speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving the interpretability of deep neural networks with stimulated learning.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

On constructing and analysing an interpretable brain model for the DNN based on hidden activity patterns.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Learning factorized feature transforms for speaker normalization.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Regression-Based Context-Dependent Modeling of Deep Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Temporally Varying Weight Regression: A Semi-Parametric Trajectory Model for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

A multimodal stroke-based predictive input for efficient Chinese text entry on mobile devices.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Joint adaptation and adaptive training of TVWR for robust automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Modeling long temporal contexts for robust DNN-based speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Refinements of regression-based context-dependent modelling of deep neural networks for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Second order vector taylor series based robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Combining Punctuation and Disfluency Prediction: An Empirical Study.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

A Beam-Search Decoder for Disfluency Detection.
Proceedings of the COLING 2014, 2014

Integrating conditional random fields and joint multi-gram model with syllabic features for grapheme-to-phone conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of temporally varying weight regression for noise robust speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Parameter clustering for temporally varying weight regression for automatic speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

An investigation of spectral restoration algorithms for deep neural networks based noise robust speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Approximated Parallel Model Combination for efficient noise-robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Noise adaptive front-end normalization based on Vector Taylor Series for Deep Neural Networks in robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Context-dependent modelling of deep neural network using logistic regression.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Multi-stream temporally varying weight regression for cross-lingual speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Improving robustness of deep neural networks via spectral masking for automatic speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Context dependent acoustic keyword spotting using deep neural network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

The NUS sung and spoken lyrics corpus: A quantitative comparison of singing and speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

MOGAT: mobile games with auditory training for children with cochlear implants.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

A Weighted Combination of Speech with Text-based Models for Arabic Diacritization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improving mandarin predictive text input by augmenting pinyin initials with speech and tonal information.
Proceedings of the International Conference on Multimodal Interaction, 2012

ICMI'12 grand challenge: haptic voice recognition.
Proceedings of the International Conference on Multimodal Interaction, 2012

Speak-as-you-swipe (SAYS): a multimodal interface combining speech and gesture keyboard synchronously for continuous mobile text entry.
Proceedings of the International Conference on Multimodal Interaction, 2012

Design and implementation of the note-taking style haptic voice recognition for mobile devices.
Proceedings of the International Conference on Multimodal Interaction, 2012

An investigation of tied-mixture GMM based triphone state clustering.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Implicit trajectory modelling using temporally varying weight regression for automatic speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition.
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8-14, 2012, Jeju Island, Korea, 2012

Using Discrete Probabilities With Bhattacharyya Measure for SVM-Based Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Comparison of Smoothing Techniques for Robust Context Dependent Acoustic Modelling in Hybrid NN/HMM Systems.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Sequential Classification Criteria for NNs in Automatic Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Trajectory-based Parallel Model Combination with a unified static and dynamic parameter compensation for noisy speech recognition.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Word level automatic alignment of music and lyrics using vocal synthesis.
ACM Trans. Multim. Comput. Commun. Appl., 2010

Statistical lattice-based spoken document retrieval.
ACM Trans. Inf. Syst., 2010

Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Semi-parametric trajectory modelling using temporally varying feature mapping for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Probabilistic state clustering using conditional random field for context-dependent acoustic modelling.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Hidden logistic linear regression for support vector machine based phone verification.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Adaptive score fusion using Weighted Logistic Linear Regression for spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

A minimum variance asynchronous Detection Error Trade-off performance analysis for multi-class detection problems.
Proceedings of the IEEE International Conference on Acoustics, 2010

Discrete expected likelihood kernel for SVM-based speaker verification.
Proceedings of the 18th European Signal Processing Conference, 2010

Improving phone verification using state-level posterior features and support vector machine for automatic mispronunciation detection.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Stream-based context-sensitive phone mapping for cross-lingual speech recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Discriminative Product-of-Expert acoustic mapping for cross-lingual phone recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

On Acoustic Diversification Front-End for Spoken Language Identification.
IEEE Trans. Speech Audio Process., 2008

A lattice-based approach to query-by-example spoken document retrieval.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

NIST 2007 Language Recognition Evaluation: From the Perspective of IIR.
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation, 2008

Context-sensitive probabilistic phone mapping model for cross-lingual speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Robust phone set mapping using decision tree clustering for cross-lingual phone recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Discriminative semi-parametric trajectory model for speech recognition.
Comput. Speech Lang., 2007

Fusion of contrastive acoustic models for parallel phonotactic spoken language identification.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Improving Speech Transcription for Mandarin-English Translation.
Proceedings of the IEEE International Conference on Acoustics, 2007

Consensus Network Decoding for Statistical Machine Translation System Combination.
Proceedings of the IEEE International Conference on Acoustics, 2007

Semantic Transliteration of Personal Names.
Proceedings of the ACL 2007, 2007

Minimum phone error training of precision matrix models.
IEEE Trans. Speech Audio Process., 2006

The Cu-Htk Mandarin Broadcast News Transcription System.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Temporally varying model parameters for large vocabulary continuous speech recognition.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Adaptation of Precision Matrix Models on Large Vocabulary Continuous Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Investigation of Acoustic Modeling Techniques for LVCSR Systems.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Development of the CU-HTK 2004 Broadcast News Transcription Systems.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Development of the CUHTK 2004 Mandarin Conversational Telephone Speech Transcription System.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Basis superposition precision matrix modelling for large vocabulary continuous speech recognition.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
