2024
NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality.
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024
2023
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
2022
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives.
CoRR, 2022
A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation.
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
Cycle consistent network for end-to-end style transfer TTS training.
Neural Networks, 2021
Effective and direct control of neural TTS prosody by removing interactions between different attributes.
Neural Networks, 2021
A Survey on Neural Speech Synthesis.
CoRR, 2021
Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis.
CoRR, 2021
Conversational End-to-End TTS for Voice Agents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.
Proceedings of the IEEE International Conference on Acoustics, 2021
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
Proceedings of the IEEE International Conference on Acoustics, 2021
Speech Bert Embedding for Improving Prosody in Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021
2020
Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications.
J. Signal Process. Syst., 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis.
CoRR, 2020
Conversational End-to-End TTS for Voice Agent.
CoRR, 2020
On Early-stop Clustering for Speaker Diarization.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
2019
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.
Speech Commun., 2019
Feature reinforcement with word embedding and parsing information in neural TTS.
CoRR, 2019
Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A New GAN-Based End-to-End TTS Training Algorithm.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Pitch-aware Approach to Single-channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019
NN-based Ordinal Regression for Assessing Fluency of ESL Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019
Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019
2018
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice.
CoRR, 2018
LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis.
CoRR, 2018
Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
From Speech Signals to Semantics - Tagging Performance at Acoustic, Phonetic and Word Levels.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
A New Glottal Neural Vocoder for Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Exploring Sequential Characteristics in Speaker Bottleneck Feature for Text-Dependent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
2016
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016
Improving speaker verification performance against long-term speaker variability.
Speech Commun., 2016
A deep bidirectional LSTM approach for video-realistic talking head.
Multim. Tools Appl., 2016
Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network.
Proceedings of the NAACL HLT 2016, 2016
A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
A KL divergence and DNN approach to cross-lingual TTS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Speaker and language factorization in DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Unsupervised speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers.
Speech Commun., 2015
HMM trajectory-guided sample selection for photo-realistic talking head.
Multim. Tools Appl., 2015
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding.
CoRR, 2015
Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network.
CoRR, 2015
An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015
Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Word embedding for recurrent neural network based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Photo-real talking head with deep bidirectional LSTM.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
From text-to-speech (TTS) to talking head - a machine learning approach to A/V speech modeling and rendering.
Proceedings of the Auditory-Visual Speech Processing, 2015
A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
2014
Pitch transformation in neural network based voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
A new Neural Network based logistic regression classifier for improving mispronunciation detection of L2 language learners.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Sequence error (SE) minimization training of neural network for voice conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
TTS synthesis with bidirectional LSTM based recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise.
Proceedings of the IEEE International Conference on Acoustics, 2014
On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014
A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2014
Discriminative scoring for speaker recognition based on I-vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
A Unified Trajectory Tiling Approach to High Quality Speech Rendering.
IEEE Trans. Speech Audio Process., 2013
A new language independent, photo-realistic talking head driven by voice only.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Binocular photometric stereo acquisition and reconstruction for 3d talking head applications.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
A source-filter based adaptive harmonic model and its application to speech prosody modification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL).
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
A fast table lookup based, statistical model driven non-uniform unit selection TTS.
Proceedings of the IEEE International Conference on Acoustics, 2013
2012
Computer-Assisted Audiovisual Language Learning.
Computer, 2012
Tip tap tones: mobile microtraining of mandarin sounds.
Proceedings of the Mobile HCI '12, 2012
Break index labeling of mandarin text via syntactic-to-prosodic tree mapping.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Pitch accent detection and prediction with DCT features and CRF model.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Constrained Multichannel Speech Dereverberation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Noise estimation using a constrained sequential HMM IN log-spectral domain.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Modeling pitch trajectory by hierarchical HMM with minimum generation error training.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
High quality lip-sync animation for 3D photo-realistic talking head.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Improved minimum converted trajectory error training for real-time speech-to-lips conversion.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
High quality lips animation with speech and captured facial action unit as A/V input.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
Voice Activity Detection Based on an Unsupervised Learning Framework.
IEEE ACM Trans. Audio Speech Lang. Process., 2011
Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units.
IEEE Trans. Speech Audio Process., 2011
Text Driven 3D Photo-Realistic Talking Head.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
A New Phonetic Candidate Generator for Improving Search Query Efficiency.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
A Sparse and Low-rank approach to efficient face alignment for photo-real talking head synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011
Synthesizing visual speech trajectory with minimum generation error.
Proceedings of the IEEE International Conference on Acoustics, 2011
A frame mapping based HMM approach to cross-lingual voice transformation.
Proceedings of the IEEE International Conference on Acoustics, 2011
Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.
Proceedings of the IEEE International Conference on Acoustics, 2011
Improved F0 modeling and generation in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2011
2010
Photo-real lips synthesis with trajectory-guided sample selection.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010
Rendering a personalized photo-real talking head from short video footage.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Automatic prosody prediction and detection with Conditional Random Field (CRF) models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Formant-based frequency warping for improving speaker adaptation in HMM TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Synthesizing photo-real talking head via trajectory-guided sample selection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
An HMM trajectory tiling (HTT) approach to high quality TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
A hierarchical F0 modeling method for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
A perceptual study of acceleration parameters in HMM-based TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Cross-validation based decision tree clustering for HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010
Improved modeling for F0 generation and V/U decision in HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010
RIch-context Unit Selection (RUS) approach to high quality TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010
An HMM Trajectory Tiling (HTT) Approach to High Quality TTS - Microsoft Entry to Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
2009
A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin-English) TTS.
IEEE Trans. Speech Audio Process., 2009
Graph-Based Partial Hypothesis Fusion for Pen-Aided Speech Input.
IEEE Trans. Speech Audio Process., 2009
A Quadratic Optimization Approach to Discriminative Training of CDHMMs.
IEEE Signal Process. Lett., 2009
A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition.
Speech Commun., 2009
Rich context modeling for high quality HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Auto-checking speech transcriptions by multiple template constrained posterior.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
A minimum v/u error approach to F0 generation in HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Model-based speech separation: identifying transcription using orthogonality.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
An evidence framework for Bayesian learning of continuous-density hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2009
Improved prosody generation by maximizing joint likelihood of state and longer units.
Proceedings of the IEEE International Conference on Acoustics, 2009
State mapping for cross-language speaker adaptation in TTS.
Proceedings of the IEEE International Conference on Acoustics, 2009
Improving mispronunciation detection using machine learning.
Proceedings of the IEEE International Conference on Acoustics, 2009
HMM-based motion trajectory generation for speech animation synthesis.
Proceedings of the Auditory-Visual Speech Processing, 2009
2008
Identifying Language Origin of Named Entity With Multiple Information Sources.
IEEE Trans. Speech Audio Process., 2008
A Constrained Line Search Optimization Method for Discriminative Training of HMMs.
IEEE Trans. Speech Audio Process., 2008
Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR.
Comput. Speech Lang., 2008
Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Pitch Tracking for Model-Based Speech Separation.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
A real-time text to audio-visual speech synthesis system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP).
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
GPU-accelerated Gaussian clustering for fMPE discriminative training.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Generating natural F0 trajectory with additive trees.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Mispronunciation detection for Mandarin Chinese.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Duration refinement by jointly optimizing state and longer unit likelihood.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
A symbol graph based handwritten math expression recognition.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008
Radical based fine trajectory HMMs of online handwritten characters.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008
Automatic mispronunciation detection for Mandarin.
Proceedings of the IEEE International Conference on Acoustics, 2008
Improving letter-to-sound conversion performance with automatically generated new words.
Proceedings of the IEEE International Conference on Acoustics, 2008
Template constrained posterior for verifying phone transcriptions.
Proceedings of the IEEE International Conference on Acoustics, 2008
Symbol graph based discriminative training and rescoring for improved math symbol recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008
Prefix tree based auto-completion for convenient bi-modal chinese character input.
Proceedings of the IEEE International Conference on Acoustics, 2008
A cross-language state mapping approach to bilingual (Mandarin-English) TTS.
Proceedings of the IEEE International Conference on Acoustics, 2008
Discriminative training for improving letter-to-sound conversion performance.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR.
IEEE Trans. Speech Audio Process., 2007
A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification.
IEEE Trans. Speech Audio Process., 2007
A Syllable Lattice Approach to Speaker Verification.
IEEE Trans. Speech Audio Process., 2007
Performance of Discriminative HMM Training in Noise.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007
Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Perceptual annotation of expressive speech.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
An HMM-based bilingual (Mandarin-English) TTS.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007
Context constrained-generalized posterior probability for verifying phone transcriptions.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Robust F0 modeling for Mandarin speech recognition in noise.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
An unsupervised approach to automatic prosodic annotation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Iterative unit selection with unnatural prosody detection.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Model-based speech separation with single-microphone input.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Minimum Error Discriminative Training for Radical-Based Online Chinese Handwriting Recognition.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
A Unified Framework for Symbol Segmentation and Recognition of Handwritten Mathematical Expressions.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
A MSD-HMM Approach to Pen Trajectory Modeling for Online Handwriting Recognition.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007
Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2007
Word Graph Based Feature Enhancement for Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007
A Segmentation Posterior Based Endpointing Algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2007
Full HMM Training for Minimizing Generation Error in Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2007
Agreement Learning for Automatic Accent Annotation.
Proceedings of the IEEE International Conference on Acoustics, 2007
A Constrained Line Search Optimization for Discriminative Training in Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007
A New Minimum Divergence Approach to Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007
Divergence-Based Similarity Measure for Spoken Document Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2007
Enrich Web Applications with Voice Internet Persona Text-to-Speech for Anyone, Anywhere.
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007
A constrained line search approach to general discriminative HMM training.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
2006
A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification.
Speech Commun., 2006
Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006
Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units.
IEICE Trans. Inf. Syst., 2006
Automatic Detection of Tone Mispronunciation in Mandarin.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
A Robust Voice Activity Detection Based on Noise Eigenspace Projection.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Signal Trajectory Based Noise Compensation for Robust Speech Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Training Discriminative HMM by Optimal Allocation of Gaussian Kernels.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
An HMM-Based Mandarin Chinese Text-To-Speech System.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Non-uniform Kernel Allocation Based Parsimonious HMM.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Noisy Speech Recognition Performance of Discriminative HMMs.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
The Paradigm for Creating Multi-lingual Text-To-Speech Voice Databases.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Integrating Hypotheses of Multiple Recognizers for Improving Mandarin LVCSR Performance.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
A multi-space distribution (MSD) approach to speech recognition of tonal languages.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Auto-segmentation based VAD for robust ASR.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP).
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Minimum divergence based discriminative training.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Word graph based speech rcognition error correction by handwriting input.
Proceedings of the 8th International Conference on Multimodal Interfaces, 2006
Improved Chinese Character Input by Merging Speech and Handwriting Recognition Hypotheses.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Auto-Segmentation Based Partitioning and Clustering Approach to Robust Endpointing.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
An Iterative Trajectory Regeneration Algorithm for Separating Mixed Speech Sources.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Syllable Lattice Based Re-Scoring For Speaker Verification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Weighted Likelihood Ratio (WLR) Hidden Markov Model for Noisy Speech Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
A Study on How Human Annotations Benefit the TTS Voice.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006
2005
A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification.
IEEE Trans. Speech Audio Process., 2005
Verification of Multi-Class Recognition Decision: A Classification Approach.
IEICE Trans. Inf. Syst., 2005
Refining phoneme segmentations using speaker-adaptive context dependent boundary models.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Phonetic transcription verification with generalized posterior probability.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Background model based posterior probability for measuring confidence.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Optimal Clustering and Non-Uniform Allocation of Gaussian Kernels in Scalar Dimension for HMM Compression.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
2004
On noise robustness of dynamic and static features for continuous Cantonese digit recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Generalized posterior probability for minimizing verification errors at subword, word and sentence levels.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Improved spoken language translation using n-best speech recognition hypotheses.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Optimal acoustic and language model weights for minimizing word verification errors.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Tone information as a confidence measure for improving Cantonese LVCSR.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Robust verification of recognized words in noise.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation.
Proceedings of the COLING 2004, 2004
2003
On divergence based clustering of normal distributions and its application to HMM adaptation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Modeling Cantonese pronunciation variation by acoustic model refinement.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Optimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
Combining neighboring filter channels to improve quantile based histogram equalization.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
Recognition of noisy speech using normalized moments.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Bell labs approach to Aurora evaluation on connected digit recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Classifier design for verification of multi-class recognition decision.
Proceedings of the IEEE International Conference on Acoustics, 2002
A dynamic in-search discriminative training approach for large vocabulary speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002
2001
A real-time Japanese broadcast news closed-captioning system.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
An auditory system-based feature for robust speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
A data selection strategy for utterance verification in continuous speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Evaluating the Aurora connected digit recognition task - a bell labs approach.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Hierarchical stochastic feature matching for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001
2000
Hands-free human-machine dialogue - corpora, technology and evaluation.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
A high-performance auditory feature for robust speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
1999
Recent advancements in automatic speaker authentication.
IEEE Robotics Autom. Mag., 1999
A block least squares approach to acoustic echo cancellation.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999
Hidden Markov models with divergence based vector quantized variances.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999
1998
Improved utterance rejection using length dependent thresholds.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998
1997
Generalized mixture of HMMs for continuous speech recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997
1996
Quantizing mixture-weights in a tied-mixture HMM.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
High-accuracy connected digit recognition for mobile applications.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996
1995
Optimizing baseforms for HMM-based speech recognition.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995
Large vocabulary, word-based Mandarin dictation system.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995
An orthogonal polynomial representation of speech signals and its probabilistic model for text independent speaker verification.
Proceedings of the 1995 International Conference on Acoustics, 1995
1994
A fast algorithm for large vocabulary keyword spotting application.
IEEE Trans. Speech Audio Process., 1994
An N-best candidates-based discriminative training for speech recognition applications.
IEEE Trans. Speech Audio Process., 1994
A Minimum Error Rate Pattern Recognition Approach to Speech Recognition.
Int. J. Pattern Recognit. Artif. Intell., 1994
The use of tree-trellis search for large-vocabulary Mandarin polysyllabic word speech recognition.
Comput. Speech Lang., 1994
Cepstral channel normalization techniques for HMM-based speaker verification.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994
Large vocabulary word recognition based on tree-trellis search.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994
Discriminative training of high performance speech recognizer using N best candidates.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994
1993
Optimal quantization of LSP parameters.
IEEE Trans. Speech Audio Process., 1993
1992
Continuous mixture HMM-LR using the a* algorithm for continuous speech recognition.
Proceedings of the Second International Conference on Spoken Language Processing, 1992
The use of cohort normalized scores for speaker verification.
Proceedings of the Second International Conference on Spoken Language Processing, 1992
Continuous probabilistic acoustic map for speaker recognition.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992
1990
A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990
A tree-trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition.
Proceedings of the First International Conference on Spoken Language Processing, 1990
Experiments in automatic talker verification using sub-word unit hidden Markov models.
Proceedings of the First International Conference on Spoken Language Processing, 1990
Optimal quantization of LSP parameters using delayed decisions.
Proceedings of the 1990 International Conference on Acoustics, 1990
Sub-word unit talker verification using hidden Markov models.
Proceedings of the 1990 International Conference on Acoustics, 1990
Speaker recognition based on source coding approaches.
Proceedings of the 1990 International Conference on Acoustics, 1990
A probabilistic acoustic map based discriminative HMM training.
Proceedings of the 1990 International Conference on Acoustics, 1990
Statistical segmentation and word modeling techniques in isolated word recognition.
Proceedings of the 1990 International Conference on Acoustics, 1990
1989
High performance connected digit recognition using hidden Markov models.
IEEE Trans. Acoust. Speech Signal Process., 1989
A phonetically labeled acoustic segment (PLAS) approach to speech analysis-synthesis.
Proceedings of the IEEE International Conference on Acoustics, 1989
Word recognition using whole word and subword models.
Proceedings of the IEEE International Conference on Acoustics, 1989
1988
A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise.
IEEE Trans. Acoust. Speech Signal Process., 1988
On the use of instantaneous and transitional spectral information in speaker recognition.
IEEE Trans. Acoust. Speech Signal Process., 1988
Optimal quantization of LSP parameters [speech coding].
Proceedings of the IEEE International Conference on Acoustics, 1988
A segment model based approach to speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1988
1987
On the automatic segmentation of speech signals.
Proceedings of the IEEE International Conference on Acoustics, 1987
A training procedure for a segment-based-network approach to isolated word recognition.
Proceedings of the IEEE International Conference on Acoustics, 1987
1986
A high quality subband speech coder with backward adaptive predictor and optimal time-frequency bit assignment.
Proceedings of the IEEE International Conference on Acoustics, 1986
Evaluation of a vector quantization talker recognition system in text independent and text dependent modes.
Proceedings of the IEEE International Conference on Acoustics, 1986
1985
A vector-quantization-based preprocessor for speaker-independent isolated word recognition.
IEEE Trans. Acoust. Speech Signal Process., 1985
Comparative study of several distortion measures for speech recognition.
Speech Commun., 1985
Single-frame vowel recognition using vector quantization with several distance measures.
AT&T Tech. J., 1985
Incorporation of temporal structure into a vector-quantization-based preprocessor for speaker-independent, isolated-word recognition.
AT&T Tech. J., 1985
A vector quantization approach to speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 1985
Subband coding of speech using backward adaptive prediction and bit allocation.
Proceedings of the IEEE International Conference on Acoustics, 1985
An efficient vector-quantization preprocessor for speaker independent isolated word recognition.
Proceedings of the IEEE International Conference on Acoustics, 1985
1984
On the performance of isolated word speech recognizers using vector quantization and temporal energy contours.
AT&T Bell Lab. Tech. J., 1984
Line spectrum pair (LSP) and speech data compression.
Proceedings of the IEEE International Conference on Acoustics, 1984
On the use of transient information in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1984
1982
Fast least-squares (LS) in the voice echo cancellation application.
Proceedings of the IEEE International Conference on Acoustics, 1982
On the high resolution and unbiased frequency estimates of sinusoids in white noise-A new adaptive approach.
Proceedings of the IEEE International Conference on Acoustics, 1982
1981
On the asymptotic behavior of a complex adaptive line enchancer (CALE).
Proceedings of the IEEE International Conference on Acoustics, 1981
1980
Fast spectral estimation of speech signal in analytic form.
Proceedings of the IEEE International Conference on Acoustics, 1980
1978
Frequency estimation by linear prediction.
Proceedings of the IEEE International Conference on Acoustics, 1978
Observations on linear estimation.
Proceedings of the IEEE International Conference on Acoustics, 1978