Chin-Hui Lee

Orcid: 0000-0002-1892-2551

Affiliations:
  • Georgia Institute of Technology, School of Electrical and Computer Engineering, USA
  • Bell Laboratories, Dialogue Systems Research Department, Murray Hill, New Jersey, NY, USA (1981-2001)


According to our database1, Chin-Hui Lee authored at least 283 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Collaborative Viseme Subword and End-to-End Modeling for Word-Level Lip Reading.
IEEE Trans. Multim., 2024

A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Quality-Aware End-to-End Audio-Visual Neural Speaker Diarization.
CoRR, 2024

An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement.
CoRR, 2024

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design.
CoRR, 2024

Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition.
CoRR, 2024

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
CoRR, 2024

Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori.
CoRR, 2024

Summary on the Chat-Scenario Chinese Lipreading (ChatCLR) Challenge.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.
Proceedings of the IEEE International Conference on Acoustics, 2024

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2024

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
Speech Commun., September, 2023

Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023

A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

QDM-SSD: Quality-Aware Dynamic Masking for Separation-Based Speaker Diarization.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge.
CoRR, 2023

AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data.
Proceedings of the IEEE International Conference on Acoustics, 2023

An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

Semi-Supervised Multi-Channel Speaker Diarization With Cross-Channel Attention.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Enhancing Privacy Preservation with Quantum Computing for Word-Level Audio-Visual Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Deep Segment Model for Acoustic Scene Classification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Audio-Visual Neural Speaker Diarization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer.
Proceedings of the IEEE International Conference on Acoustics, 2022

The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021

Separation Guided Speaker Diarization in Realistic Mismatched Conditions.
CoRR, 2021

A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.
CoRR, 2021

USTC-NELSLIP System Description for DIHARD-III Challenge.
CoRR, 2021

Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Speech Emotion Recognition Based on Acoustic Segment Model.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

A Model Ensemble Approach for Sound Event Localization and Detection.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Scenario-Dependent Speaker Diarization for DIHARD-III Challenge.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Enhancement Autoencoder with Hierarchical Latent Structure.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Two-Stage Approach to Device-Robust Acoustic Scene Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression.
IEEE Trans. Signal Process., 2020

A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

On Mean Absolute Error for Deep Neural Network Based Vector-to-Vector Regression.
IEEE Signal Process. Lett., 2020

Lip-reading with Hierarchical Pyramidal Convolution and Self-Attention.
CoRR, 2020

Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.
CoRR, 2020

Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Geometry Constrained Progressive Learning for Lstm-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Progressive Multi-Target Network Based Speech Enhancement with Snr-Preselection for Robust Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

L-Vector: Neural Label Embedding for Domain Adaptation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Performance Analysis for Tensor-Train Decomposition to Deep Neural Network Based Vector-to-Vector Regression.
Proceedings of the 54th Annual Conference on Information Sciences and Systems, 2020

2019
Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

An iterative mask estimation approach to deep learning based multi-channel speech recognition.
Speech Commun., 2019

A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge.
IEEE J. Sel. Top. Signal Process., 2019

Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

DNN Training Based on Classic Gain Function for Single-channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2019

Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

A LSTM-Based Joint Progressive Learning Framework for Simultaneous Speech Dereverberation and Denoising.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018

A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech.
J. Signal Process. Syst., 2018

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.
J. Signal Process. Syst., 2018

A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR.
CoRR, 2018

Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Progressive Deep Learning Approach to Child Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speaker Diarization with Enhancing Speech for the First DIHARD Challenge.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Densely Connected Progressive Learning for LSTM-Based Speech Enhancement.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Online LSTM-based Iterative Mask Estimation for Multi-Channel Speech Enhancement and ASR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation.
IEEE Trans. Biomed. Eng., 2017

A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017

Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation.
Pattern Recognit. Lett., 2017

An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017

A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation.
EURASIP J. Adv. Signal Process., 2017

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017

On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A transfer learning and progressive stacking approach to reducing deep model sizes with an application to speech enhancement.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

Multiple-target deep learning for LSTM-RNN based speech enhancement.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017

LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
A Keyword-Aware Language Modeling Approach to Spoken Keyword Search.
J. Signal Process. Syst., 2016

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition.
Neurocomputing, 2016

Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016

Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exemplar-inspired strategies for low-resource spoken keyword search in Swahili.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A study on sampling of STFT modifications in time and frequency domains for DNN-based speech dereverberation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Deep neural network based voice conversion with a large synthesized parallel corpus.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Towards a direct Bayesian adaptation framework for deep models.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
A Regression Approach to Speech Enhancement Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

A Probabilistic Framework for Representing Dialog Systems and Entropy-Based Dialog Management Through Dynamic Stochastic State Evolution.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Maximum a Posteriori Adaptation of Network Parameters in Deep Models.
CoRR, 2015

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An entropy minimization framework for goal-driven dialogue management.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A universal VAD based on jointly trained deep neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Maximum a posteriori adaptation of network parameters in deep models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Rapid adaptation for deep neural networks through multi-task learning.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Language-resource independent speech segmentation using cues from a spectrogram image.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A keyword-aware grammar framework for LVCSR-based spoken keyword search.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Low-resource keyword search strategies for tamil.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
An Efficient Gradient-based Approach to Optimizing Average Precision Through Maximal Figure-of-Merit Learning.
J. Signal Process. Syst., 2014

A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

An Experimental Study on Speech Enhancement Based on Deep Neural Networks.
IEEE Signal Process. Lett., 2014

An artificial neural network approach to automatic speech processing.
Neurocomputing, 2014

Cross-language transfer learning for deep neural network based speech enhancement.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Dynamic noise aware training for speech enhancement based on deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Feature space maximum a posteriori linear regression for adaptation of deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Robust speech recognition with speech enhanced deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Dialect levelling in Finnish: a universal speech attribute approach.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers.
Proceedings of the IEEE International Conference on Acoustics, 2014

Deep learning vector quantization for acoustic information retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

An i-vector based descriptor for alphabetical gesture recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Attribute based lattice rescoring in spontaneous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Introducing attribute features to foreign accent recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014

2013
A Bottom-Up Modular Search Approach to Large Vocabulary Continuous Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.
IEEE Trans. Speech Audio Process., 2013

Speech Recognition Using Long-Span Temporal Patterns in a Deep Network Model.
IEEE Signal Process. Lett., 2013

An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition.
Proc. IEEE, 2013

Exploiting deep neural networks for detection-based speech recognition.
Neurocomputing, 2013

Model-based margin estimation for hidden Markov model learning and generalisation.
IET Signal Process., 2013

Universal attribute characterization of spoken languages for automatic spoken language recognition.
Comput. Speech Lang., 2013


A blind segmentation approach to acoustic event detection based on i-vector.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Minimax i-vector extractor for short duration speaker verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Knowledge integration for improving performance in LVCSR.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A particle filter compensation approach to robust LVCSR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

An experimental study on structural-MAP approaches to implementing very large vocabulary speech recognition systems for real-world tasks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data.
IEEE Trans. Speech Audio Process., 2012


A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A study on cross-language knowledge integration in Mandarin LVCSR.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Hermitian based Hidden Activation Functions for Adaptation of Hybrid HMM/ANN Models.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Consumer-level multimedia event detection through unsupervised audio signal modeling.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Per-Exemplar Fusion Learning for Video Retrieval and Recounting.
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Explicit Performance Metric Optimization for Fusion-Based Video Retrieval.
Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012

2011
GENIE TRECVID 2011 Multimedia Event Detection: Late-Fusion Approaches to Combine Multiple Audio-Visual features.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010

A survey on recent progress in the ASAT/SIRKUS paradigm.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A particle filter feature compensation approach to robust speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Shrinkage model adaptation in automatic speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An acoustic segment model approach to incorporating temporal information into speaker modeling for text-independent speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2009

Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education].
IEEE Signal Process. Mag., 2009

Developments and directions in speech recognition and understanding, Part 1 [DSP Education].
IEEE Signal Process. Mag., 2009

A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition.
Speech Commun., 2009

Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009

Minimum Classification Error Training to Improve Isolated Chord Recognition.
Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

Exploring universal attribute characterization of spoken languages for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
Proceedings of the IEEE International Conference on Acoustics, 2009

A phonetic feature based lattice rescoring approach to LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2009

A study on multilingual acoustic modeling for large vocabulary ASR.
Proceedings of the IEEE International Conference on Acoustics, 2009

A study on hidden Markov model's generalization capability for speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008
Optimizing the Performance of Spoken Language Recognition With Discriminative Training.
IEEE Trans. Speech Audio Process., 2008

Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A penalized logistic regression approach to detection based phone classification.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Continuous phone recognition without target language training data.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Soft margin estimation with various separation levels for LVCSR.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

On a generalization of margin-based discriminative training to robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Discriminative learning for optimizing detection performance in spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Toward a detector-based universal phone recognizer.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Approximate Test Risk Bound Minimization Through Soft Margin Estimation.
IEEE Trans. Speech Audio Process., 2007

A Vector Space Modeling Approach to Spoken Language Identification.
IEEE Trans. Speech Audio Process., 2007

Information fusion techniques for automatic image annotation.
Proceedings of the VISAPP 2007: Proceedings of the Second International Conference on Computer Vision Theory and Applications, Barcelona, Spain, March 8-11, 2007, 2007


Fusion of Region and Image-Based Techniques for Automatic Image Annotation.
Proceedings of the Advances in Multimedia Modeling, 2007

An ensemble modeling approach to joint characterization of speaker and speaking environments.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Soft margin feature extraction for automatic speech recognition.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Boosting of Maximal Figure of Merit Classifiers for Automatic Image Annotation.
Proceedings of the International Conference on Image Processing, 2007

High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring.
Proceedings of the IEEE International Conference on Acoustics, 2007

Approximate Test Risk Minimization Through Soft Margin Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2007

Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Towards bottom-up continuous phone recognition.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

A study on soft margin estimation for LVCSR.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006

A vector space approach to environment modeling for robust speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A study on lattice rescoring with knowledge scores for automatic speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

A study on detection based automatic speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Soft margin estimation of hidden Markov model parameters.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation.
Proceedings of the Image and Video Retrieval, 5th International Conference, 2006

2005
A study on separation between acoustic models and its applications.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

An acoustic segment modeling approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

On designing and evaluating speech event detectors.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A text categorization approach to automatic language identification.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Unsupervised Speaker Adaptation for Phonetic Transcription Based Voice Dialing.
Proceedings of the Fuzzy Systems and Knowledge Discovery, Second International Conference, 2005

Iterative Training Techniques for Phonetic Template Based Speech Recognition with a Speaker-Independent Phonetic Recognizer.
Proceedings of the AI 2005: Advances in Artificial Intelligence, 2005

2002
Multilingual speech recognition with language identification.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Weighted graph based decision tree optimization for high accuracy acoustic modeling.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2000
From Graphical to Voice User Interface: The Next Revolution.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000


  Loading...