Frank K. Soong

Orcid: 0000-0002-9088-3577

Affiliations:
  • Microsoft Research Asia, Beijing, China
  • Chinese University of Hong Kong (CUHK), Department of Systems Engineering and Engineering Management, Hong Kong
  • Bell Labs Research, Murray Hill, NJ, USA
  • University of Stanford, Department of Electrical Engineering, CA, USA (PhD)


According to our database1, Frank K. Soong authored at least 305 papers between 1978 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

IEEE Fellow

IEEE Fellow 2010, "For contributions to speech processing".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2024

2023
MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022
ParaTTS: Learning Linguistic and Prosodic Cross-Sentence Information in Paragraph-Based TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Disentangling Style and Speaker Attributes for TTS Style Transfer.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives.
CoRR, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Fastspeech TTS with Efficient Self-Attention and Compact Feed-Forward Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Universal Ordinal Regression for Assessing Phoneme-Level Pronunciation.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Cycle consistent network for end-to-end style transfer TTS training.
Neural Networks, 2021

Effective and direct control of neural TTS prosody by removing interactions between different attributes.
Neural Networks, 2021

A Survey on Neural Speech Synthesis.
CoRR, 2021

Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis.
CoRR, 2021

Conversational End-to-End TTS for Voice Agents.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Improving Performance of Seen and Unseen Speech Style Transfer in End-to-End Neural TTS.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A New High Quality Trajectory Tiling Based Hybrid TTS In Real Time.
Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.
Proceedings of the IEEE International Conference on Acoustics, 2021

MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speech Bert Embedding for Improving Prosody in Neural TTS.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications.
J. Signal Process. Syst., 2020

s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis.
CoRR, 2020

Conversational End-to-End TTS for Voice Agent.
CoRR, 2020

On Early-stop Clustering for Speaker Diarization.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Transfer Learning for Improving Singing-Voice Detection in Polyphonic Instrumental Music.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Efficient Subband Linear Prediction for LPCNet-Based Neural Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Improved Frame-Unit-Selection Based Voice Conversion System Without Parallel Training Data.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving Prosody with Linguistic and Bert Derived Features in Multi-Speaker Based Mandarin Chinese Neural TTS.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Improving LPCNET-Based Text-to-Speech with Linear Prediction-Structured Mixture Density Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

2019
Voice conversion with SI-DNN and KL divergence based mapping without parallel training data.
Speech Commun., 2019

Feature reinforcement with word embedding and parsing information in neural TTS.
CoRR, 2019

Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A New GAN-Based End-to-End TTS Training Algorithm.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Pitch-aware Approach to Single-channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

NN-based Ordinal Regression for Assessing Fluency of ESL Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019

Domain Adversarial Training for Improving Keyword Spotting Performance of ESL Speech.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Modeling Multi-speaker Latent Space to Improve Neural TTS: Quick Enrolling New Speaker and Enhancing Premium Voice.
CoRR, 2018

LP-WaveNet: Linear Prediction-based WaveNet Speech Synthesis.
CoRR, 2018

Frame Selection in SI-DNN Phonetic Space with WaveNet Vocoder for Voice Conversion without Parallel Training Data.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

From Speech Signals to Semantics - Tagging Performance at Acoustic, Phonetic and Word Levels.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

A Refined Query-by-Example Approach to Spoken-Term-Detection on ESL learners' Speech.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A New Glottal Neural Vocoder for Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Exploring Sequential Characteristics in Speaker Bottleneck Feature for Text-Dependent Speaker Verification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Effective Spectral and Excitation Modeling Techniques for LSTM-RNN-Based Speech Synthesis Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Proficiency Assessment of ESL Learner's Sentence Prosody with TTS Synthesized Voice as Reference.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Improving Sub-Phone Modeling for Better Native Language Identification with Non-Native English Speech.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Perceptual quality and modeling accuracy of excitation parameters in DLSTM-based speech synthesis systems.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Improving native language (L1) identifation with better VAD and TDNN trained separately on native and non-native English corpora.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Modeling F0 trajectories in hierarchically structured deep neural networks.
Speech Commun., 2016

Improving speaker verification performance against long-term speaker variability.
Speech Commun., 2016

A deep bidirectional LSTM approach for video-realistic talking head.
Multim. Tools Appl., 2016

Learning Distributed Word Representations For Bidirectional LSTM Recurrent Neural Network.
Proceedings of the NAACL HLT 2016, 2016

A KL Divergence and DNN-Based Approach to Voice Conversion without Parallel Training Sentences.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved Time-Frequency Trajectory Excitation Vocoder for DNN-Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A KL divergence and DNN approach to cross-lingual TTS.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Speaker and language factorization in DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Unsupervised speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

KL-divergence based mispronunciation detection via DNN and decision tree in the phonetic space.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers.
Speech Commun., 2015

HMM trajectory-guided sample selection for photo-realistic talking head.
Multim. Tools Appl., 2015

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding.
CoRR, 2015

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network.
CoRR, 2015

An improved DNN-based approach to mispronunciation detection and diagnosis of L2 learners' speech.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Word embedding for recurrent neural network based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Photo-real talking head with deep bidirectional LSTM.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

From text-to-speech (TTS) to talking head - a machine learning approach to A/V speech modeling and rendering.
Proceedings of the Auditory-Visual Speech Processing, 2015

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Pitch transformation in neural network based voice conversion.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A new Neural Network based logistic regression classifier for improving mispronunciation detection of L2 language learners.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Sequence error (SE) minimization training of neural network for voice conversion.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

TTS synthesis with bidirectional LSTM based recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A maximum a Posterior-based reconstruction approach to speech bandwidth expansion in noise.
Proceedings of the IEEE International Conference on Acoustics, 2014

On the training aspects of Deep Neural Network (DNN) for parametric TTS synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2014

A DNN-based acoustic modeling of tonal language and its application to Mandarin pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2014

Discriminative scoring for speaker recognition based on I-vectors.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013
A Unified Trajectory Tiling Approach to High Quality Speech Rendering.
IEEE Trans. Speech Audio Process., 2013

A new language independent, photo-realistic talking head driven by voice only.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Binocular photometric stereo acquisition and reconstruction for 3d talking head applications.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A source-filter based adaptive harmonic model and its application to speech prosody modification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A new DNN-based high quality pronunciation evaluation for computer-aided language learning (CALL).
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A fast table lookup based, statistical model driven non-uniform unit selection TTS.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Computer-Assisted Audiovisual Language Learning.
Computer, 2012

Tip tap tones: mobile microtraining of mandarin sounds.
Proceedings of the Mobile HCI '12, 2012

Break index labeling of mandarin text via syntactic-to-prosodic tree mapping.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Cross validation and Minimum Generation Error for improved model clustering in HMM-based TTS.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A unified trajectory tiling approach to high quality TTS and cross-lingual voice transformation.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Pitch accent detection and prediction with DCT features and CRF model.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Constrained Multichannel Speech Dereverberation.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Objective Intelligibility Assessment of Text-to-Speech System using Template Constrained Generalized Posterior Probability.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Turning a Monolingual Speaker into Multilingual for a Mixed-language TTS.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Noise estimation using a constrained sequential HMM IN log-spectral domain.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Modeling pitch trajectory by hierarchical HMM with minimum generation error training.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

High quality lip-sync animation for 3D photo-realistic talking head.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improved minimum converted trajectory error training for real-time speech-to-lips conversion.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

High quality lips animation with speech and captured facial action unit as A/V input.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
Voice Activity Detection Based on an Unsupervised Learning Framework.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units.
IEEE Trans. Speech Audio Process., 2011

Text Driven 3D Photo-Realistic Talking Head.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A New Phonetic Candidate Generator for Improving Search Query Efficiency.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improvements in Speaker Characterization Using Spectral Subband Energy Based on Harmonic plus Noise Model.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Sparse and Low-rank approach to efficient face alignment for photo-real talking head synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

Synthesizing visual speech trajectory with minimum generation error.
Proceedings of the IEEE International Conference on Acoustics, 2011

A frame mapping based HMM approach to cross-lingual voice transformation.
Proceedings of the IEEE International Conference on Acoustics, 2011

Speaker characterization using spectral subband energy ratio based on Harmonic plus Noise Model.
Proceedings of the IEEE International Conference on Acoustics, 2011

Improved F0 modeling and generation in voice conversion.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Photo-real lips synthesis with trajectory-guided sample selection.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Rendering a personalized photo-real talking head from short video footage.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Automatic prosody prediction and detection with Conditional Random Field (CRF) models.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Formant-based frequency warping for improving speaker adaptation in HMM TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Synthesizing photo-real talking head via trajectory-guided sample selection.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

An HMM trajectory tiling (HTT) approach to high quality TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A hierarchical F0 modeling method for HMM-based speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A perceptual study of acceleration parameters in HMM-based TTS.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Cross-validation based decision tree clustering for HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

Improved modeling for F0 generation and V/U decision in HMM-based TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

RIch-context Unit Selection (RUS) approach to high quality TTS.
Proceedings of the IEEE International Conference on Acoustics, 2010

An HMM Trajectory Tiling (HTT) Approach to High Quality TTS - Microsoft Entry to Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

2009
A Cross-Language State Sharing and Mapping Approach to Bilingual (Mandarin-English) TTS.
IEEE Trans. Speech Audio Process., 2009

Graph-Based Partial Hypothesis Fusion for Pen-Aided Speech Input.
IEEE Trans. Speech Audio Process., 2009

A Quadratic Optimization Approach to Discriminative Training of CDHMMs.
IEEE Signal Process. Lett., 2009

A Multi-Space Distribution (MSD) and two-stream tone modeling approach to Mandarin speech recognition.
Speech Commun., 2009

Rich context modeling for high quality HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Auto-checking speech transcriptions by multiple template constrained posterior.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A minimum v/u error approach to F0 generation in HMM-based TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Model-based speech separation: identifying transcription using orthogonality.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

An evidence framework for Bayesian learning of continuous-density hidden Markov models.
Proceedings of the IEEE International Conference on Acoustics, 2009

Improved prosody generation by maximizing joint likelihood of state and longer units.
Proceedings of the IEEE International Conference on Acoustics, 2009

State mapping for cross-language speaker adaptation in TTS.
Proceedings of the IEEE International Conference on Acoustics, 2009

Improving mispronunciation detection using machine learning.
Proceedings of the IEEE International Conference on Acoustics, 2009

HMM-based motion trajectory generation for speech animation synthesis.
Proceedings of the Auditory-Visual Speech Processing, 2009

2008
Identifying Language Origin of Named Entity With Multiple Information Sources.
IEEE Trans. Speech Audio Process., 2008

A Constrained Line Search Optimization Method for Discriminative Training of HMMs.
IEEE Trans. Speech Audio Process., 2008

Tone-enhanced generalized character posterior probability (GCPP) for Cantonese LVCSR.
Comput. Speech Lang., 2008

Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Pitch Tracking for Model-Based Speech Separation.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Improving Automatic Evaluation of Mandarin Pronunciation with Speaker Adaptive Training (SAT) and MLLR Speaker Adaption.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Prosody for Mandarin speech recognition: a comparative study of read and spontaneous speech.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A real-time text to audio-visual speech synthesis system.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Efficient handwriting correction of speech recognition errors with template constrained posterior (TCP).
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

GPU-accelerated Gaussian clustering for fMPE discriminative training.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Generating natural F0 trajectory with additive trees.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

An ellipsoid constrained quadratic programming perspective to discriminative training of HMMs.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Mispronunciation detection for Mandarin Chinese.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Duration refinement by jointly optimizing state and longer unit likelihood.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

A symbol graph based handwritten math expression recognition.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Radical based fine trajectory HMMs of online handwritten characters.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Automatic mispronunciation detection for Mandarin.
Proceedings of the IEEE International Conference on Acoustics, 2008

Improving letter-to-sound conversion performance with automatically generated new words.
Proceedings of the IEEE International Conference on Acoustics, 2008

Template constrained posterior for verifying phone transcriptions.
Proceedings of the IEEE International Conference on Acoustics, 2008

Symbol graph based discriminative training and rescoring for improved math symbol recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Prefix tree based auto-completion for convenient bi-modal chinese character input.
Proceedings of the IEEE International Conference on Acoustics, 2008

A cross-language state mapping approach to bilingual (Mandarin-English) TTS.
Proceedings of the IEEE International Conference on Acoustics, 2008

Discriminative training for improving letter-to-sound conversion performance.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Static and Dynamic Spectral Features: Their Noise Robustness and Optimal Weights for ASR.
IEEE Trans. Speech Audio Process., 2007

A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification.
IEEE Trans. Speech Audio Process., 2007

A Syllable Lattice Approach to Speaker Verification.
IEEE Trans. Speech Audio Process., 2007

Performance of Discriminative HMM Training in Noise.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007

Measuring attribute dissimilarity with HMM KL-divergence for speech synthesis.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Perceptual annotation of expressive speech.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

An HMM-based bilingual (Mandarin-English) TTS.
Proceedings of the Sixth ISCA Workshop on Speech Synthesis, 2007

Context constrained-generalized posterior probability for verifying phone transcriptions.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Robust F0 modeling for Mandarin speech recognition in noise.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

An unsupervised approach to automatic prosodic annotation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Iterative unit selection with unnatural prosody detection.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Model-based speech separation with single-microphone input.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Minimum Error Discriminative Training for Radical-Based Online Chinese Handwriting Recognition.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

A Unified Framework for Symbol Segmentation and Recognition of Handwritten Mathematical Expressions.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

A MSD-HMM Approach to Pen Trajectory Modeling for Online Handwriting Recognition.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

Generalized Segment Posterior Probability for Automatic Mandarin Pronunciation Evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2007

Word Graph Based Feature Enhancement for Noisy Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

A Segmentation Posterior Based Endpointing Algorithm.
Proceedings of the IEEE International Conference on Acoustics, 2007

Full HMM Training for Minimizing Generation Error in Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2007

Agreement Learning for Automatic Accent Annotation.
Proceedings of the IEEE International Conference on Acoustics, 2007

A Constrained Line Search Optimization for Discriminative Training in Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

A New Minimum Divergence Approach to Discriminative Training.
Proceedings of the IEEE International Conference on Acoustics, 2007

Divergence-Based Similarity Measure for Spoken Document Retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2007

Enrich Web Applications with Voice Internet Persona Text-to-Speech for Anyone, Anywhere.
Proceedings of the Human-Computer Interaction. HCI Intelligent Multimodal Interaction Environments, 2007

A constrained line search approach to general discriminative HMM training.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
A tree-based kernel selection approach to efficient Gaussian mixture model-universal background model based speaker identification.
Speech Commun., 2006

Modeling Cantonese Pronunciation Variations for Large-Vocabulary Continuous Speech Recognition.
Int. J. Comput. Linguistics Chin. Lang. Process., 2006

Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units.
IEICE Trans. Inf. Syst., 2006

Automatic Detection of Tone Mispronunciation in Mandarin.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A Robust Voice Activity Detection Based on Noise Eigenspace Projection.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Signal Trajectory Based Noise Compensation for Robust Speech Recognition.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Training Discriminative HMM by Optimal Allocation of Gaussian Kernels.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Improved Mandarin Speech Recognition by Lattice Rescoring with Enhanced Tone Models.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

An HMM-Based Mandarin Chinese Text-To-Speech System.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Non-uniform Kernel Allocation Based Parsimonious HMM.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Noisy Speech Recognition Performance of Discriminative HMMs.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

The Paradigm for Creating Multi-lingual Text-To-Speech Voice Databases.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Integrating Hypotheses of Multiple Recognizers for Improving Mandarin LVCSR Performance.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

A multi-space distribution (MSD) approach to speech recognition of tonal languages.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Auto-segmentation based VAD for robust ASR.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Generalization of the minimum classification error (MCE) training based on maximizing generalized posterior probability (GPP).
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Minimum divergence based discriminative training.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Word graph based speech rcognition error correction by handwriting input.
Proceedings of the 8th International Conference on Multimodal Interfaces, 2006

Improved Chinese Character Input by Merging Speech and Handwriting Recognition Hypotheses.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Auto-Segmentation Based Partitioning and Clustering Approach to Robust Endpointing.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

An Iterative Trajectory Regeneration Algorithm for Separating Mixed Speech Sources.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Syllable Lattice Based Re-Scoring For Speaker Verification.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Weighted Likelihood Ratio (WLR) Hidden Markov Model for Noisy Speech Recognition.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

A Study on How Human Annotations Benefit the TTS Voice.
Proceedings of the Blizzard Challenge 2006, Pittsburgh, PA, USA, September 16, 2006, 2006

2005
A Dynamic In-Search Data Selection Method With Its Applications to Acoustic Modeling and Utterance Verification.
IEEE Trans. Speech Audio Process., 2005

Verification of Multi-Class Recognition Decision: A Classification Approach.
IEICE Trans. Inf. Syst., 2005

Refining phoneme segmentations using speaker-adaptive context dependent boundary models.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Phonetic transcription verification with generalized posterior probability.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Background model based posterior probability for measuring confidence.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Harmonic filtering for joint estimation of pitch and voiced source with single-microphone input.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Generalized Posterior Probability for Minimum Error Verification of Recognized Sentences.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Optimal Clustering and Non-Uniform Allocation of Gaussian Kernels in Scalar Dimension for HMM Compression.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
On noise robustness of dynamic and static features for continuous Cantonese digit recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Generalized posterior probability for minimizing verification errors at subword, word and sentence levels.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Improved spoken language translation using n-best speech recognition hypotheses.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Optimal acoustic and language model weights for minimizing word verification errors.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Tone information as a confidence measure for improving Cantonese LVCSR.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Robust verification of recognized words in noise.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A Unified Approach in Speech-to-Speech Translation: Integrating Features of Speech recognition and Machine Translation.
Proceedings of the COLING 2004, 2004

2003
On divergence based clustering of normal distributions and its application to HMM adaptation.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Modeling Cantonese pronunciation variation by acoustic model refinement.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Optimal clustering of multivariate normal distributions using divergence and its application to HMM adaptation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Combining neighboring filter channels to improve quantile based histogram equalization.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Recognition of noisy speech using normalized moments.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Bell labs approach to Aurora evaluation on connected digit recognition.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Classifier design for verification of multi-class recognition decision.
Proceedings of the IEEE International Conference on Acoustics, 2002

A dynamic in-search discriminative training approach for large vocabulary speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2002

2001
A real-time Japanese broadcast news closed-captioning system.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

An auditory system-based feature for robust speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

A data selection strategy for utterance verification in continuous speech recognition.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Evaluating the Aurora connected digit recognition task - a bell labs approach.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Hierarchical stochastic feature matching for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2001

2000
Hands-free human-machine dialogue - corpora, technology and evaluation.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

A high-performance auditory feature for robust speech recognition.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1999
Recent advancements in automatic speaker authentication.
IEEE Robotics Autom. Mag., 1999

A block least squares approach to acoustic echo cancellation.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

Hidden Markov models with divergence based vector quantized variances.
Proceedings of the 1999 IEEE International Conference on Acoustics, 1999

1998
Improved utterance rejection using length dependent thresholds.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

1997
Generalized mixture of HMMs for continuous speech recognition.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Quantizing mixture-weights in a tied-mixture HMM.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

High-accuracy connected digit recognition for mobile applications.
Proceedings of the 1996 IEEE International Conference on Acoustics, 1996

1995
Optimizing baseforms for HMM-based speech recognition.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

Large vocabulary, word-based Mandarin dictation system.
Proceedings of the Fourth European Conference on Speech Communication and Technology, 1995

An orthogonal polynomial representation of speech signals and its probabilistic model for text independent speaker verification.
Proceedings of the 1995 International Conference on Acoustics, 1995

1994
A fast algorithm for large vocabulary keyword spotting application.
IEEE Trans. Speech Audio Process., 1994

An N-best candidates-based discriminative training for speech recognition applications.
IEEE Trans. Speech Audio Process., 1994

A Minimum Error Rate Pattern Recognition Approach to Speech Recognition.
Int. J. Pattern Recognit. Artif. Intell., 1994

The use of tree-trellis search for large-vocabulary Mandarin polysyllabic word speech recognition.
Comput. Speech Lang., 1994

Cepstral channel normalization techniques for HMM-based speaker verification.
Proceedings of the 3rd International Conference on Spoken Language Processing, 1994

Large vocabulary word recognition based on tree-trellis search.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

Discriminative training of high performance speech recognizer using N best candidates.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1993
Optimal quantization of LSP parameters.
IEEE Trans. Speech Audio Process., 1993

1992
Continuous mixture HMM-LR using the a* algorithm for continuous speech recognition.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

The use of cohort normalized scores for speaker verification.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

Continuous probabilistic acoustic map for speaker recognition.
Proceedings of the 1992 IEEE International Conference on Acoustics, 1992

1990
A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition.
Proceedings of the Speech and Natural Language: Proceedings of a Workshop Held at Hidden Valley, 1990

A tree-trellis based fast search for finding the n best sentence hypotheses in continuous speech recognition.
Proceedings of the First International Conference on Spoken Language Processing, 1990

Experiments in automatic talker verification using sub-word unit hidden Markov models.
Proceedings of the First International Conference on Spoken Language Processing, 1990

Optimal quantization of LSP parameters using delayed decisions.
Proceedings of the 1990 International Conference on Acoustics, 1990

Sub-word unit talker verification using hidden Markov models.
Proceedings of the 1990 International Conference on Acoustics, 1990

Speaker recognition based on source coding approaches.
Proceedings of the 1990 International Conference on Acoustics, 1990

A probabilistic acoustic map based discriminative HMM training.
Proceedings of the 1990 International Conference on Acoustics, 1990

Statistical segmentation and word modeling techniques in isolated word recognition.
Proceedings of the 1990 International Conference on Acoustics, 1990

1989
High performance connected digit recognition using hidden Markov models.
IEEE Trans. Acoust. Speech Signal Process., 1989

A phonetically labeled acoustic segment (PLAS) approach to speech analysis-synthesis.
Proceedings of the IEEE International Conference on Acoustics, 1989

Word recognition using whole word and subword models.
Proceedings of the IEEE International Conference on Acoustics, 1989

1988
A frequency-weighted Itakura spectral distortion measure and its application to speech recognition in noise.
IEEE Trans. Acoust. Speech Signal Process., 1988

On the use of instantaneous and transitional spectral information in speaker recognition.
IEEE Trans. Acoust. Speech Signal Process., 1988

Optimal quantization of LSP parameters [speech coding].
Proceedings of the IEEE International Conference on Acoustics, 1988

A segment model based approach to speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1988

1987
On the automatic segmentation of speech signals.
Proceedings of the IEEE International Conference on Acoustics, 1987

A training procedure for a segment-based-network approach to isolated word recognition.
Proceedings of the IEEE International Conference on Acoustics, 1987

1986
A high quality subband speech coder with backward adaptive predictor and optimal time-frequency bit assignment.
Proceedings of the IEEE International Conference on Acoustics, 1986

Evaluation of a vector quantization talker recognition system in text independent and text dependent modes.
Proceedings of the IEEE International Conference on Acoustics, 1986

1985
A vector-quantization-based preprocessor for speaker-independent isolated word recognition.
IEEE Trans. Acoust. Speech Signal Process., 1985

Comparative study of several distortion measures for speech recognition.
Speech Commun., 1985

Single-frame vowel recognition using vector quantization with several distance measures.
AT&T Tech. J., 1985

Incorporation of temporal structure into a vector-quantization-based preprocessor for speaker-independent, isolated-word recognition.
AT&T Tech. J., 1985

A vector quantization approach to speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 1985

Subband coding of speech using backward adaptive prediction and bit allocation.
Proceedings of the IEEE International Conference on Acoustics, 1985

An efficient vector-quantization preprocessor for speaker independent isolated word recognition.
Proceedings of the IEEE International Conference on Acoustics, 1985

1984
On the performance of isolated word speech recognizers using vector quantization and temporal energy contours.
AT&T Bell Lab. Tech. J., 1984

Line spectrum pair (LSP) and speech data compression.
Proceedings of the IEEE International Conference on Acoustics, 1984

On the use of transient information in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 1984

1982
Fast least-squares (LS) in the voice echo cancellation application.
Proceedings of the IEEE International Conference on Acoustics, 1982

On the high resolution and unbiased frequency estimates of sinusoids in white noise-A new adaptive approach.
Proceedings of the IEEE International Conference on Acoustics, 1982

1981
On the asymptotic behavior of a complex adaptive line enchancer (CALE).
Proceedings of the IEEE International Conference on Acoustics, 1981

1980
Fast spectral estimation of speech signal in analytic form.
Proceedings of the IEEE International Conference on Acoustics, 1980

1978
Frequency estimation by linear prediction.
Proceedings of the IEEE International Conference on Acoustics, 1978

Observations on linear estimation.
Proceedings of the IEEE International Conference on Acoustics, 1978


  Loading...