Helen M. Meng
Orcid: 0000-0002-4427-3532Affiliations:
- The Chinese University of Hog Kong
- Massachusetts Institute of Technology, Cambridge, MA, USA (former)
According to our database1,
Helen M. Meng
authored at least 554 papers
between 1990 and 2024.
Collaborative distances:
Collaborative distances:
Awards
IEEE Fellow
IEEE Fellow 2013, "For contributions to spoken language and multimodal systems".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Neural Networks, January, 2024
InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains.
CoRR, 2024
Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech.
CoRR, 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024
CoRR, 2024
Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions.
CoRR, 2024
SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024
SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024
Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.
CoRR, 2024
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.
CoRR, 2024
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation.
CoRR, 2024
Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.
CoRR, 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.
CoRR, 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.
CoRR, 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching.
CoRR, 2024
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024
IEEE Access, 2024
Rethinking Machine Ethics - Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters.
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
Proceedings of the IEEE International Conference on Acoustics, 2024
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema.
Proceedings of the KDD Workshop on Human-Interpretable AI 2024 co-located with 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), 2024
Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024
Designing Scaffolding Strategies for Conversational Agents in Dialog Task of Neurocognitive Disorders Screening.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
A phenomenographic approach on teacher conceptions of teaching Artificial Intelligence (AI) in K-12 schools.
Educ. Inf. Technol., January, 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
CoRR, 2023
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
CoRR, 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.
CoRR, 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023
Proceedings of the IEEE International Conference on Teaching, 2023
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Imitation Learning from Expert Video Data for Dissection Trajectory Prediction in Endoscopic Surgical Procedure.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023
GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network.
Proceedings of the IEEE International Conference on Acoustics, 2023
Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Discriminative Speaker Representation Via Contrastive Learning with Class-Aware Attention in Angular Space.
Proceedings of the IEEE International Conference on Acoustics, 2023
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
Feature Selection and Text Embedding for Detecting Dementia from Spontaneous Cantonese.
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Jointly Modelling Transcriptions and Phonemes with Optimal Features to Detect Dementia from Spontaneous Cantonese.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
2022
IEEE Trans. Educ., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022
CoRR, 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus.
CoRR, 2022
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
CoRR, 2022
On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
CoRR, 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion.
CoRR, 2022
CoRR, 2022
Convex Polytope Modelling for Unsupervised Derivation of Semantic Structure for Data-efficient Natural Language Understanding.
CoRR, 2022
User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022
Proceedings of the IEEE International Conference on Teaching, 2022
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022
Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Proceedings of the Natural Language Processing and Chinese Computing, 2022
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022
A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2022
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2022
FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis.
Proceedings of the 29th International Conference on Computational Linguistics, 2022
TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders Screening.
Proceedings of the CHI '22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022, 2022
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
CoRR, 2021
Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR, 2021
Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing.
CoRR, 2021
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.
CoRR, 2021
Adversarially learning disentangled speech representations for robust multi-factor voice conversion.
CoRR, 2021
CoRR, 2021
Unstructured Knowledge Access in Task-oriented Dialog Modeling using Language Inference, Knowledge Retrieval and Knowledge-Integrative Response Generation.
CoRR, 2021
Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021
A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021
Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models.
Proceedings of the IEEE International Conference on Acoustics, 2021
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
An Integrated Approach of Machine Learning and Systems Thinking for Waiting Time Prediction in an Emergency Department.
Int. J. Medical Informatics, 2020
Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network.
CoRR, 2020
Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020
Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
CoRR, 2019
Proceedings of the World Wide Web Conference, 2019
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.
Speech Commun., 2018
npj Digit. Medicine, 2018
Data Visualization with IBM Watson Analytics for Global Cancer Trends Comparison from World Health Organization.
Int. J. Heal. Inf. Syst. Informatics, 2018
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Social Media as a Tool to Look for People with Dementia Who Become Lost: Factors That Matter.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018
Proceedings of the 2018 International Conference on Digital Health, 2018
Proceedings of the 9th IEEE International Conference on Cognitive Infocommunications, 2018
Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018
2017
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Intonation classification for L2 English speech using multi-distribution deep neural networks.
Comput. Speech Lang., 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017
Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Data Visualization on Global Trends on Cancer Incidence An Application of IBM Watson Analytics.
Proceedings of the 50th Hawaii International Conference on System Sciences, 2017
Personal Wearable Devices to Measure Heart Rate Variability: A Framework of Cloud Platform for Public Health Research.
Proceedings of the 2017 International Conference on Digital Health, 2017
Classification of Visit-to-Visit Blood Pressure Variability: A Machine Learning Approach for Data Clustering on Systolic Blood Pressure Intervention Trial (SPRINT).
Proceedings of the 2017 International Conference on Digital Health, 2017
Parallel probabilistic swarm guidance by exploiting Kronecker product structures in discrete-time Markov chains.
Proceedings of the 2017 American Control Conference, 2017
Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016
Capitalizing on musical rhythm for prosodic training in computer-aided language learning.
Comput. Speech Lang., 2016
CoRR, 2016
Kronecker product approximation with multiple factor matrices via the tensor product algorithm.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016
An embedding approach for context-aware collaborative recommendation and visualization.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 2016 IEEE SENSORS, Orlando, FL, USA, October 30 - November 3, 2016, 2016
Phonetic posteriorgrams for many-to-one voice conversion without parallel data training.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Recognizing stances in Mandarin social ideological debates with text and acoustic features.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016
Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Exploring articulatory characteristics of Cantonese dysarthric speech using distinctive features.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Question detection from acoustic features using recurrent neural network with gated recurrent unit.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 49th Hawaii International Conference on System Sciences, 2016
Blood Pressure Monitoring on the Cloud System in Elderly Community Centres: A Data Capturing Platform for Application Research in Public Health.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016
Utilizing Real-Time Travel Information, Mobile Applications and Wearable Devices for Smart Public Transportation.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016
2015
Introduction to the Special Section on Continuous Space and Related Methods in Natural Language Processing.
IEEE ACM Trans. Audio Speech Lang. Process., 2015
Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015
Sensors, 2015
Multim. Tools Appl., 2015
Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015
Blood Pressure Management with Data Capturing in the Cloud among Hypertensive Patients: A Monitoring Platform for Hypertensive Patients.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015
A Data Capturing Platform in the Cloud for Behavioral Analysis among Smokers: An Application Platform for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015
Embracing Big Data for Simulation Modelling of Emergency Department Processes and Activities.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015
A Real-Time Decision Support Tool for Disaster Response: A Mathematical Programming Approach.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015
Indoor Air Monitoring Platform and Personal Health Reporting System: Big Data Analytics for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015
A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
IEEE ACM Trans. Audio Speech Lang. Process., 2014
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014
Multim. Tools Appl., 2014
Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014
SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Statistical parametric speech synthesis using weighted multi-distribution deep belief network.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Phonological modeling of mispronunciation gradations in L2 English speech of L1 Chinese learners.
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.
CoRR, 2013
Predicting gradation of L2 English mispronunciations using crowdsourced ratings and phonological rules.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013
Predicting gradation of L2 English mispronunciations using ASR with extended recognition network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Predicting User Satisfaction in Spoken Dialog System Evaluation With Collaborative Filtering.
IEEE J. Sel. Top. Signal Process., 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Design and Collection of an L2 English Corpus with a Suprasegmental Focus for Chinese Learners of English.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
2010
Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR.
IEEE J. Sel. Top. Signal Process., 2010
Introduction to the Issue on Statistical Learning Methods for Speech and Language Processing.
IEEE J. Sel. Top. Signal Process., 2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Collaborative filtering model for user satisfaction prediction in Spoken Dialog System evaluation.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions.
Proceedings of the 15th International Conference on Intelligent User Interfaces, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Development of an articulatory visual-speech synthesizer to support language learning.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
An enhanced Fishervoice subspace framework for text-independent speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Statistical phone duration modeling to filter for intact utterances in a computer-assisted pronunciation training system.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.
Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010
2009
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.
IEEE Trans. Speech Audio Process., 2009
Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions.
IEEE Trans. Speech Audio Process., 2009
Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009
Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training.
Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009
Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features.
Proceedings of the ACL 2009, 2009
2008
The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008
Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling.
IEEE Trans. Computers, 2007
Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007
High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Complementarity and redundancy in multimodal user inputs with speech and pen gestures.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.
Proceedings of the IEEE International Conference on Acoustics, 2007
Effects of Device Mismatch, Language Mismatch and Environmental Mismatch on Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007
Adaptive Weight Estimation in Multi-Biometric Verification using Fuzzy Logic Decision Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007
Deriving salient learners' mispronunciations from cross-language phonological comparisons.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007
Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.
Proceedings of the Affective Computing and Intelligent Interaction, 2007
2006
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006
A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006
A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Joint interpretation of input speech and pen gestures for multimodal human-computer interaction.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Advances in Biometrics, International Conference, 2006
A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance.
Proceedings of the HLT/EMNLP 2005, 2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
2004
ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs.
ACM Trans. Comput. Hum. Interact., 2004
Int. J. Speech Technol., 2004
Comput. Speech Lang., 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Bilingual response generation using semi-automatically-induced templates for a mixed-initiative dialog system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Detection of language boundary in code-switching utterances by bi-phone probabilities.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the Biometric Authentication, First International Conference, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Proceedings of the Information Retrieval Technology, Asia Information Retrieval Symposium, 2004
2003
IEEE Trans. Speech Audio Process., 2003
Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion.
ACM Trans. Asian Lang. Inf. Process., 2003
CU VOCAL Web Service: A Text-to-speech Synthesis Web Service for Voice-enabled Web-mediated Applications.
Proceedings of the Twelfth International World Wide Web Conference - Posters, 2003
Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Natural language response generation in mixed-initiative dialogs using task goals and dialog acts.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries.
IEEE Trans. Knowl. Data Eng., 2002
IEEE Trans. Speech Audio Process., 2002
ACM Trans. Asian Lang. Inf. Process., 2002
Interact. Comput., 2002
Improvements on a belief network framework for natural language understanding of domain-specific Chinese queries.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Intelligent speech for information systems (ISIS): a multi-modal, trilingual, distributed conversational system with combined interaction and delegation dialogs.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
2001
A hierarchical lexical representation for bi-directional spelling-to-pronunciation/pronunciation-to-spelling generation.
Speech Commun., 2001
Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks.
Proceedings of the 14th Conference on Computational Linguistics and Speech Processing, 2001
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, 2001
Scalability and Portability of a Belief Network-based Dialog Model for Different Application Domains.
Proceedings of the First International Conference on Human Language Technology Research, 2001
Proceedings of the First International Conference on Human Language Technology Research, 2001
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2001
Automatic Grammar Partitioning for Syntactic Parsing.
Proceedings of the Seventh International Workshop on Parsing Technologies (IWPT-2001), 2001
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Semi-automatic grammar induction for bi-directional English-Chinese machine translation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
Proceedings of the IEEE International Conference on Acoustics, 2001
Proceedings of the IEEE International Conference on Acoustics, 2001
Proceedings of the IEEE International Conference on Acoustics, 2001
Proceedings of the 10th IEEE International Conference on Fuzzy Systems, 2001
2000
Initial Development Towards a Trilingual Speech Interface for Financial Information Inquiries.
Int. J. Speech Technol., 2000
Proceedings of the Sixth Internatonal Workshop on Parsing Technologies, 2000
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000
Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, 2000, Hong Kong, China, September 30, 2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
Proceedings of the IEEE International Conference on Acoustics, 2000
Proceedings of the IEEE International Conference on Acoustics, 2000
Proceedings of the Proceedings on the 2000 conference on Universal Usability, 2000
1999
Proceedings of the 12th Research on Computational Linguistics Conference, 1999
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999
1997
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997
1996
Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology.
Speech Commun., 1996
Multilingual human-computer interactions: from information access to language learning.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
1995
PhD thesis, 1995
1994
Proceedings of the Human Language Technology, 1994
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994
1993
Reversible letter-to-sound sound-to-letter generation based on parsing word morphology.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993
1992
Proceedings of the Second International Conference on Spoken Language Processing, 1992
1991
Signal Representation Attribute Extraction and the Use Distinctive Features for Phonetic Classification.
Proceedings of the Speech and Natural Language, 1991
Proceedings of the 1991 International Conference on Acoustics, 1991
1990
A comparative study of acoustic representations of speech for vowel classification using multi-layer perceptrons.
Proceedings of the First International Conference on Spoken Language Processing, 1990