Chin-Hui Lee
Orcid: 0000-0002-1892-2551Affiliations:
- Georgia Institute of Technology, School of Electrical and Computer Engineering, USA
- Bell Laboratories, Dialogue Systems Research Department, Murray Hill, New Jersey, NY, USA (1981-2001)
According to our database1,
Chin-Hui Lee
authored at least 283 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Multim., 2024
A Variance-Preserving Interpolation Approach for Diffusion Models With Applications to Single Channel Speech Enhancement and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Optimizing Audio-Visual Speech Enhancement Using Multi-Level Distortion Measures for Audio-Visual Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement.
CoRR, 2024
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design.
CoRR, 2024
Language-Universal Speech Attributes Modeling for Zero-Shot Multilingual Spoken Keyword Recognition.
CoRR, 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
CoRR, 2024
Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori.
CoRR, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints.
Proceedings of the IEEE International Conference on Acoustics, 2024
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2024
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Multi-Modal Emotion Recognition Using Entropy-Based Fusion and Pruning-Based Network Architecture Optimization.
Proceedings of the IEEE International Conference on Acoustics, 2024
A Spatial Long-Term Iterative Mask Estimation Approach for Multi-Channel Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
Space-and-speaker-aware acoustic modeling with effective data augmentation for recognition of multi-array conversational speech.
Speech Commun., September, 2023
Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions.
Speech Commun., July, 2023
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023
AD-TUNING: An Adaptive CHILD-TUNING Approach to Efficient Hyperparameter Optimization of Child Networks for Speech Processing Tasks in the SUPERB Benchmark.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Multiple-Teacher Pruning Based Self-Distillation (MT-PSD) Approach to Model Compression for Audio-Visual Wake Word Spotting.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Unsupervised Adaptation with Quality-Aware Masking to Improve Target-Speaker Voice Activity Detection for Speaker Diarization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Incorporating Visual Information Reconstruction into Progressive Learning for Optimizing audio-visual Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2023
A Quantum Kernel Learning Approach to Acoustic Modeling for Spoken Command Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Loss Function Design for DNN-Based Sound Event Localization and Detection on Low-Resource Realistic Data.
Proceedings of the IEEE International Conference on Acoustics, 2023
An Experimental Study on Sound Event Localization and Detection Under Realistic Testing Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2023
Incorporating Lip Features into Audio-Visual Multi-Speaker DOA Estimation by Gated Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Enhancing Privacy Preservation with Quantum Computing for Word-Level Audio-Visual Speech Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Improving Sound Event Localization and Detection with Class-Dependent Sound Separation for Real-World Scenarios.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Correlated Multi-Level Speech Enhancement for Robust Real-World ASR Applications Using Mask-Waveform-Feature Optimization.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
2022
An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
An Ensemble Teacher-Student Learning Approach with Poisson Sub-sampling to Differential Privacy Preserving Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
A Study on Joint Modeling and Data Augmentation of Multi-Modalities for Audio-Visual Scene Classification.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Separation-Based Speaker Diarization Via Iterative Model Refinement And Speaker Embedding Based Post-Processing.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer.
Proceedings of the IEEE International Conference on Acoustics, 2022
The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement.
Neural Networks, 2021
A Lottery Ticket Hypothesis Framework for Low-Complexity Device-Robust Neural Acoustic Scene Classification.
CoRR, 2021
Acoustic Modeling for Multi-Array Conversational Speech Recognition in the Chime-6 Challenge.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Audio-Visual Information Fusion Using Cross-Modal Teacher-Student Learning for Voice Activity Detection in Realistic Environments.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A Maximum Likelihood Approach to SNR-Progressive Learning Using Generalized Gaussian Distribution for LSTM-Based Speech Enhancement.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Automatic Lip-Reading with Hierarchical Pyramidal Convolution and Self-Attention for Image Sequences with No Word Boundaries.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Decentralizing Feature Extraction with Quantum Convolutional Neural Network for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
A Progressive Learning Approach to Adaptive Noise and Speech Estimation for Speech Enhancement and Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network-Based Vector-to-Vector Regression.
IEEE Trans. Signal Process., 2020
A Multi-Target SNR-Progressive Learning Approach to Regression Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE Signal Process. Lett., 2020
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation.
CoRR, 2020
Using Speech Enhancement Preprocessing for Speech Emotion Recognition in Realistic Noisy Conditions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
A Noise-Aware Memory-Attention Network Architecture for Regression-Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
A Space-and-Speaker-Aware Iterative Mask Estimation Approach to Multi-Channel Speech Recognition in the CHiME-6 Challenge.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Exploring Deep Hybrid Tensor-to-Vector Network Architectures for Regression Based Speech Enhancement.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Enhanced Adversarial Strategically-Timed Attacks Against Deep Reinforcement Learning.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Cross-Task Transfer Learning Approach to Adapting Deep Speech Enhancement Models to Unseen Background Noise Using Paired Senone Classifiers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Study of Child Speech Extraction Using Joint Speech Enhancement and Separation in Realistic Conditions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2D-to-2D Mask Estimation for Speech Enhancement Based on Fully Convolutional Neural Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Progressive Multi-Target Network Based Speech Enhancement with Snr-Preselection for Robust Speaker Diarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Tensor-To-Vector Regression for Multi-Channel Speech Enhancement Based on Tensor-Train Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Maximum Likelihood Approach to Multi-Objective Learning Using Generalized Gaussian Distributions for Dnn-Based Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
High-Resolution Attention Network with Acoustic Segment Model for Acoustic Scene Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Performance Analysis for Tensor-Train Decomposition to Deep Neural Network Based Vector-to-Vector Regression.
Proceedings of the 54th Annual Conference on Information Sciences and Systems, 2020
2019
Speech Enhancement Based on Teacher-Student Deep Learning Using Improved Speech Presence Probability for Noise-Robust Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
A Theory on Deep Neural Network Based Vector-to-Vector Regression With an Illustration of Its Expressive Power in Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Improving Mispronunciation Detection of Mandarin Tones for Non-Native Learners With Soft-Target Tone Labels and BLSTM-Based Deep Tone Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
An iterative mask estimation approach to deep learning based multi-channel speech recognition.
Speech Commun., 2019
A Speaker-Dependent Approach to Separation of Far-Field Multi-Talker Microphone Array Speech for Front-End Processing in the CHiME-5 Challenge.
IEEE J. Sel. Top. Signal Process., 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
KL-Divergence Regularized Deep Neural Network Adaptation for Low-Resource Speaker-Dependent Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
DNN Training Based on Classic Gain Function for Single-channel Speech Enhancement and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Two-stage Single-channel Speaker-dependent Speech Separation Approach for Chime-5 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2019
Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Speech Enhancement Neural Network Architecture with SNR-Progressive Multi-Target Learning for Robust Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
A LSTM-Based Joint Progressive Learning Framework for Simultaneous Speech Dereverberation and Denoising.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018
A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech.
J. Signal Process. Syst., 2018
Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks.
J. Signal Process. Syst., 2018
A Multiobjective Learning and Ensembling Approach to High-Performance Speech Enhancement With Compact Neural Network Architectures.
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Acoustics-guided evaluation (AGE): a new measure for estimating performance of speech enhancement algorithms for robust ASR.
CoRR, 2018
Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
A Maximum Likelihood Approach to Masking-based Speech Enhancement Using Deep Neural Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Error Modeling via Asymmetric Laplace Distribution for Deep Neural Network Based Single-Channel Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
A Novel LSTM-Based Speech Preprocessor for Speaker Diarization in Realistic Mismatch Conditions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Improving Mandarin Tone Mispronunciation Detection for Non-Native Learners with Soft-Target Tone Labels and BLSTM-Based Deep Models.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Online LSTM-based Iterative Mask Estimation for Multi-Channel Speech Enhancement and ASR.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018
2017
A Deep Denoising Autoencoder Approach to Improving the Intelligibility of Vocoded Speech in Cochlear Implant Simulation.
IEEE Trans. Biomed. Eng., 2017
A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments.
Speech Commun., 2017
Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation.
Pattern Recognit. Lett., 2017
An End-to-End Deep Learning Approach to Simultaneous Speech Dereverberation and Acoustic Modeling for Robust Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2017
A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation.
EURASIP J. Adv. Signal Process., 2017
An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech.
Comput. Speech Lang., 2017
On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017
A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
On Design of Robust Deep Models for CHiME-4 Multi-Channel Speech Recognition with Multiple Configurations of Array Microphones.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
A transfer learning and progressive stacking approach to reducing deep model sizes with an application to speech enhancement.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features.
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
Proceedings of the Hands-free Speech Communications and Microphone Arrays, 2017
LSTM-based iterative mask estimation and post-processing for multi-channel speech enhancement.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017
2016
J. Signal Process. Syst., 2016
A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
IEEE ACM Trans. Audio Speech Lang. Process., 2016
A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition.
Neurocomputing, 2016
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016
Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
A speaker-dependent deep learning approach to joint speech separation and acoustic modeling for multi-talker automatic speech recognition.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
An Iterative Phase Recovery Framework with Phase Mask for Spectral Mapping with an Application to Speech Enhancement.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
An experimental study on joint modeling of mixed-bandwidth data via deep neural networks for robust speech recognition.
Proceedings of the 2016 International Joint Conference on Neural Networks, 2016
Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
A study on sampling of STFT modifications in time and frequency domains for DNN-based speech dereverberation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
A study on target feature activation and normalization and their impacts on the performance of DNN based speech dereverberation systems.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Unsupervised single-channel speech separation via deep neural network for different gender mixtures.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Using tone-based extended recognition network to detect non-native Mandarin tone mispronunciations.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Zero resource anti-spoofing detection for unit selection based synthetic speech using image spectrogram artifacts.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
IEEE ACM Trans. Audio Speech Lang. Process., 2015
A Probabilistic Framework for Representing Dialog Systems and Entropy-Based Dialog Management Through Dynamic Stochastic State Evolution.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
High-resolution acoustic modeling and compact language modeling of language-universal speech attributes for spoken language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Speech Separation based on signal-noise-dependent deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Language-resource independent speech segmentation using cues from a spectrogram image.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Joint training of front-end and back-end deep neural networks for robust speech recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the Latent Variable Analysis and Signal Separation, 2015
A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015
An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
2014
An Efficient Gradient-based Approach to Optimizing Average Precision Through Maximal Figure-of-Merit Learning.
J. Signal Process. Syst., 2014
A MAP-based Online Estimation Approach to Ensemble Speaker and Speaking Environment Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2014
IEEE Signal Process. Lett., 2014
Neurocomputing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
A fusion approach to spoken language identification based on combining multiple phone recognizers and speech attribute detectors.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
A novel keyword+LVCSR-filler based grammar network representation for spoken keyword search.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Feature space maximum a posteriori linear regression for adaptation of deep neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A maximal figure-of-merit learning approach to maximizing mean average precision with deep neural network based classifiers.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Global variance equalization for improving deep neural network based speech enhancement.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014
2013
A Bottom-Up Modular Search Approach to Large Vocabulary Continuous Speech Recognition.
IEEE Trans. Speech Audio Process., 2013
Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems.
IEEE Trans. Speech Audio Process., 2013
IEEE Signal Process. Lett., 2013
An Information-Extraction Approach to Speech Processing: Analysis, Detection, Verification, and Recognition.
Proc. IEEE, 2013
Neurocomputing, 2013
IET Signal Process., 2013
Universal attribute characterization of spoken languages for automatic spoken language recognition.
Comput. Speech Lang., 2013
Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
An experimental study on structural-MAP approaches to implementing very large vocabulary speech recognition systems for real-world tasks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data.
IEEE Trans. Speech Audio Process., 2012
Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012
A new confidence measure combining Hidden Markov Models and Artificial Neural Networks of phonemes for effective keyword spotting.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Consumer-level multimedia event detection through unsupervised audio signal modeling.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the 2012 IEEE International Conference on Multimedia and Expo, 2012
Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012
2011
GENIE TRECVID 2011 Multimedia Event Detection: Late-Fusion Approaches to Combine Multiple Audio-Visual features.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011
A Bottom-Up Stepwise Knowledge-Integration Approach to Large Vocabulary Continuous Speech Recognition Using Weighted Finite State Machines.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
2010
A Study on the Generalization Capability of Acoustic Models for Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Exploiting context-dependency and acoustic resolution of universal speech attribute models in spoken language recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
An acoustic segment model approach to incorporating temporal information into speaker modeling for text-independent speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010
Experimental studies on continuous speech recognition using neural architectures with "adaptive" hidden activation functions.
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
An Ensemble Speaker and Speaking Environment Modeling Approach to Robust Speech Recognition.
IEEE Trans. Speech Audio Process., 2009
Updated MINDS report on speech recognition and understanding, Part 2 [DSP Education].
IEEE Signal Process. Mag., 2009
Developments and directions in speech recognition and understanding, Part 1 [DSP Education].
IEEE Signal Process. Mag., 2009
A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition.
Speech Commun., 2009
Soft margin estimation on improving environment structures for ensemble speaker and speaking environment modeling.
Proceedings of the 3rd International Universal Communication Symposium, 2009
Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009
Exploring universal attribute characterization of spoken languages for spoken language recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
A study on soft margin estimation of linear regression parameters for speaker adaptation.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Ensemble speaker and speaking environment modeling approach with advanced online estimation process.
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the IEEE International Conference on Acoustics, 2009
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009
MAP estimation of online mapping parameters in ensemble speaker and speaking environment modeling.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009
2008
Optimizing the Performance of Spoken Language Recognition With Discriminative Training.
IEEE Trans. Speech Audio Process., 2008
Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
On a generalization of margin-based discriminative training to robust speech recognition.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Discriminative learning for optimizing detection performance in spoken language recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
IEEE Trans. Speech Audio Process., 2007
IEEE Trans. Speech Audio Process., 2007
Information fusion techniques for automatic image annotation.
Proceedings of the VISAPP 2007: Proceedings of the Second International Conference on Computer Vision Theory and Applications, Barcelona, Spain, March 8-11, 2007, 2007
Report on the NSF-sponsored Human Language Technology Workshop on Industrial Centers.
Proceedings of Machine Translation Summit XI: Papers, 2007
Proceedings of the Advances in Multimedia Modeling, 2007
An ensemble modeling approach to joint characterization of speaker and speaking environments.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the International Conference on Image Processing, 2007
High-Accuracy Phone Recognition By Combining High-Performance Lattice Generation and Knowledge Based Rescoring.
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
Language Recognition Based on Score Distribution Feature Vectors and Discriminative Classifier Fusion.
Proceedings of the Odyssey 2006: The Speaker and Language Recognition Workshop, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Bayesian Learning of Hierarchical Multinomial Mixture Models of Concepts for Automatic Image Annotation.
Proceedings of the Image and Video Retrieval, 5th International Conference, 2006
2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
A Study on Knowledge Source Integration for Candidate Rescoring in Automatic Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Proceedings of the Fuzzy Systems and Knowledge Discovery, Second International Conference, 2005
Iterative Training Techniques for Phonetic Template Based Speech Recognition with a Speaker-Independent Phonetic Recognizer.
Proceedings of the AI 2005: Advances in Artificial Intelligence, 2005
2002
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
2000
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000