Helen M. Meng

Orcid: 0000-0002-4427-3532

Affiliations:
  • The Chinese University of Hog Kong
  • Massachusetts Institute of Technology, Cambridge, MA, USA (former)


According to our database1, Helen M. Meng authored at least 554 papers between 1990 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2013, "For contributions to spoken language and multimodal systems".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Automatic selection of spoken language biomarkers for dementia detection.
Neural Networks, January, 2024

InstructTTS: Modelling Expressive TTS in Discrete Latent Space With Natural Language Style Prompt.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Decoding on Graphs: Faithful and Sound Reasoning on Knowledge Graphs through Generation of Well-Formed Chains.
CoRR, 2024

Towards Within-Class Variation in Alzheimer's Disease Detection from Spontaneous Speech.
CoRR, 2024

AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC.
CoRR, 2024

Speaking from Coarse to Fine: Improving Neural Codec Language Model via Multi-Scale Speech Coding and Generation.
CoRR, 2024

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions.
CoRR, 2024

SongCreator: Lyrics-based Universal Song Generation.
CoRR, 2024

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis.
CoRR, 2024

SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models.
CoRR, 2024

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024

Large Language Model-based FMRI Encoding of Language Functions for Subjects with Neurocognitive Disorder.
CoRR, 2024

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System.
CoRR, 2024

Autoregressive Speech Synthesis without Vector Quantization.
CoRR, 2024

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation.
CoRR, 2024

Purple-teaming LLMs with Adversarial Defender Training.
CoRR, 2024

Seamless Language Expansion: Enhancing Multilingual Mastery in Self-Supervised Models.
CoRR, 2024

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.
CoRR, 2024

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner.
CoRR, 2024

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.
CoRR, 2024

CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024

Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching.
CoRR, 2024

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder.
CoRR, 2024

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models.
CoRR, 2024

Injecting Linguistic Knowledge Into BERT for Dialogue State Tracking.
IEEE Access, 2024

Rethinking Machine Ethics - Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy.
Proceedings of the International Joint Conference on Neural Networks, 2024

UniAudio: Towards Universal Audio Generation with Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization.
Proceedings of the IEEE International Conference on Acoustics, 2024

Consistent and Relevant: Rethink the Query Embedding in General Sound Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

SCNet: Sparse Compression Network for Music Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024

Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Dual Parameter-Efficient Fine-Tuning for Speaker Representation Via Speaker Prompt Tuning and Adapters.
Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
Proceedings of the IEEE International Conference on Acoustics, 2024

Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Ontology-grounded Automatic Knowledge Graph Construction by LLM under Wikidata schema.
Proceedings of the KDD Workshop on Human-Interpretable AI 2024 co-located with 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2024), 2024

Adaptive Query Rewriting: Aligning Rewriters through Marginal Probability of Conversational Answers.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Natural Language-Assisted Multi-modal Medication Recommendation.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

Designing Scaffolding Strategies for Conversational Agents in Dialog Task of Neurocognitive Disorders Screening.
Proceedings of the CHI Conference on Human Factors in Computing Systems, 2024

Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

COKE: A Cognitive Knowledge Graph for Machine Theory of Mind.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

SimCalib: Graph Neural Network Calibration Based on Similarity between Nodes.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
A phenomenographic approach on teacher conceptions of teaching Artificial Intelligence (AI) in K-12 schools.
Educ. Inf. Technol., January, 2023

Meta-Generalization for Domain-Invariant Speaker Verification.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Hiformer: Sequence Modeling Networks With Hierarchical Attention Mechanisms.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE Based Neural TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Language Agents for Detecting Implicit Stereotypes in Text-to-image Models at Scale.
CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.
CoRR, 2023

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning.
CoRR, 2023

CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
CoRR, 2023

SAIL: Search-Augmented Instruction Learning.
CoRR, 2023

Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.
CoRR, 2023

Interpretable Unified Language Checking.
CoRR, 2023

InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt.
CoRR, 2023

Learning Analytics from Spoken Discussion Dialogs in Flipped Classroom.
CoRR, 2023

An experiential learning approach to learn AI in an online workshop.
Proceedings of the IEEE International Conference on Teaching, 2023

SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Imitation Learning from Expert Video Data for Dissection Trajectory Prediction in Endoscopic Surgical Procedure.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023, 2023

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Integrated and Enhanced Pipeline System to Support Spoken Language Analytics for Screening Neurocognitive Disorders.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

PunCantonese: A Benchmark Corpus for Low-Resource Cantonese Punctuation Restoration from Speech Transcripts.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Decision Support System for Chronic Diseases Based on Drug-Drug Interactions.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network.
Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Keyword-Specific Acoustic Model Pruning for Open-Vocabulary Keyword Spotting.
Proceedings of the IEEE International Conference on Acoustics, 2023

CB-Conformer: Contextual Biasing Conformer for Biased Word Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Synthetic Corpus Generation Method for Neural Vocoder Training.
Proceedings of the IEEE International Conference on Acoustics, 2023

Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

TFCnet: Time-Frequency Domain Corrector for Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Contrastive Learning with Dialogue Attributes for Neural Dialogue Generation.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Sidecar Separator Can Convert A Single-Talker Speech Recognition System to A Multi-Talker One.
Proceedings of the IEEE International Conference on Acoustics, 2023

Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Pretrained Representations With Task-Related Keywords for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

Discriminative Speaker Representation Via Contrastive Learning with Class-Aware Attention in Angular Space.
Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

Feature Selection and Text Embedding for Detecting Dementia from Spontaneous Cantonese.
Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Inter-Subnet: Speech Enhancement with Subband Interaction.
Proceedings of the IEEE International Conference on Acoustics, 2023

SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Search Augmented Instruction Learning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

The Defender's Perspective on Automatic Speaker Verification: An Overview.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Jointly Modelling Transcriptions and Phonemes with Optimal Features to Detect Dementia from Spontaneous Cantonese.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

Robust Representation Learning for Speech Emotion Recognition with Moment Exchange.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Creation and Evaluation of a Pretertiary Artificial Intelligence (AI) Curriculum.
IEEE Trans. Educ., 2022

Bayesian Neural Network Language Modeling for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations.
CoRR, 2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022

Robust Unsupervised Cross-Lingual Word Embedding using Domain Flow Interpolation.
CoRR, 2022

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus.
CoRR, 2022

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
CoRR, 2022

Cross-lingual Word Embeddings in Hyperbolic Space.
CoRR, 2022

On-the-fly Feature Based Speaker Adaptation for Dysarthric and Elderly Speech Recognition.
CoRR, 2022

Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion.
CoRR, 2022

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks.
CoRR, 2022

Convex Polytope Modelling for Unsupervised Derivation of Semantic Structure for Data-efficient Natural Language Understanding.
CoRR, 2022

Toward Self-Learning End-to-End Dialog Systems.
CoRR, 2022

User Satisfaction Estimation with Sequential Dialogue Act Modeling in Goal-oriented Conversational Systems.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Developing an AI literacy test for junior secondary students: The first stage.
Proceedings of the IEEE International Conference on Teaching, 2022

Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Toward Self-Learning End-to-End Task-oriented Dialog Systems.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Speech-Vision Based Multi-Modal AI Control of a Magnetic Anchored and Actuated Endoscope.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2022

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

Overview of NLPCC 2022 Shared Task 7: Fine-Grained Dialogue Social Bias Measurement.
Proceedings of the Natural Language Processing and Chinese Computing, 2022

Partner Personas Generation for Dialogue Response Generation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Boosting the Performance of SpEx+ by Attention and Contextual Mechanism.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

HILvoice:Human-in-the-Loop Style Selection for Elder-Facing Speech Synthesis.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Swithboard Corpus.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Spoofing-Aware Speaker Verification by Multi-Level Fusion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Exploring linguistic feature and model combination for speech recognition based automatic AD detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Context-aware Multimodal Fusion for Emotion Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Automatic Selection of Discriminative Features for Dementia Detection in Cantonese-Speaking People.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Enhancement with Fullband-Subband Cross-Attention Network.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-Channel Speaker Diarization Using Spatial Features for Meetings.
Proceedings of the IEEE International Conference on Acoustics, 2022

The CUHK-Tencent Speaker Diarization System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Mixed Precision DNN Quantization for Overlapped Speech Separation and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Characterizing the Adversarial Vulnerability of Speech self-Supervised Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Partially Fake Audio Detection by Self-Attention-Based Fake Span Discovery.
Proceedings of the IEEE International Conference on Acoustics, 2022

Neural Architecture Search for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Adversarial Sample Detection for Speaker Verification by Neural Vocoders.
Proceedings of the IEEE International Conference on Acoustics, 2022

VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Multitask Learning Framework for Speaker Change Detection with Content Information from Unsupervised Speech Decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2022

Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022

Exploiting Cross Domain Acoustic-to-Articulatory Inverted Features for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2022

FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Character-Level Span-Based Model for Mandarin Prosodic Structure Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

COLD: A Benchmark for Chinese Offensive Language Detection.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

TalkTive: A Conversational Agent Using Backchannels to Engage Older Adults in Neurocognitive Disorders Screening.
Proceedings of the CHI '22: CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022, 2022

On Controlling Fallback Responses for Grounded Dialogue Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout.
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

2021
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Speech Emotion Recognition Using Sequential Capsule Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Exemplar-Based Emotive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Recent Progress in the CUHK Dysarthric Speech Recognition System.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition.
CoRR, 2021

Partner Personas Generation for Diverse Dialogue Generation.
CoRR, 2021

Countering Online Hate Speech: An NLP Perspective.
CoRR, 2021

Spotting adversarial samples for speaker verification by neural vocoders.
CoRR, 2021

Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR, 2021

Hierarchical Modeling for Out-of-Scope Domain and Intent Classification.
CoRR, 2021

Open Intent Discovery through Unsupervised Semantic Clustering and Dependency Parsing.
CoRR, 2021

Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.
CoRR, 2021

Adversarially learning disentangled speech representations for robust multi-factor voice conversion.
CoRR, 2021

Creation and Evaluation of a Pre-tertiary Artificial Intelligence (AI) Curriculum.
CoRR, 2021

Unstructured Knowledge Access in Task-oriented Dialog Modeling using Language Inference, Knowledge Retrieval and Knowledge-Integrative Response Generation.
CoRR, 2021

Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Out-of-Scope Domain and Intent Classification through Hierarchical Joint Modeling.
Proceedings of the Conversational AI for Natural Human-Centric Interaction, 2021

Automatic Extraction of Semantic Patterns in Dialogs using Convex Polytopic Model.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Automatic Speaker-level Pronunciation Assessment of L2 Speech Using Posterior Probabilities from Multiple Utterances.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Exploring Cross-lingual Singing Voice Synthesis Using Speech Data.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Transformer Based End-to-End Mispronunciation Detection and Diagnosis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Domain Adaptation for Dysarthric Speech Detection via Domain Adversarial Training and Mutual Information Minimization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

VQMIVC: Vector Quantization and Mutual Information-Based Unsupervised Speech Representation Disentanglement for One-Shot Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Channel-Wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Towards Multi-Scale Style Control for Expressive Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Adversarial Data Augmentation for Disordered Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Fastsvc: Fast Cross-Domain Singing Voice Conversion With Feature-Wise Linear Modulation.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

A Joint Training Framework of Multi-Look Separator and Speaker Embedding Extractor for Overlapped Speech.
Proceedings of the IEEE International Conference on Acoustics, 2021

Development of the Cuhk Elderly Speech Recognition System for Neurocognitive Disorder Detection Using the Dementiabank Corpus.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bayesian Transformer Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Mixed Precision Quantization of Transformer Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models.
Proceedings of the IEEE International Conference on Acoustics, 2021

The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021

Fcl-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.
Proceedings of the IEEE International Conference on Acoustics, 2021

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Comparative Study of Acoustic and Linguistic Features Classification for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

Replay and Synthetic Speech Detection with Res2Net Architecture.
Proceedings of the IEEE International Conference on Acoustics, 2021

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
Proceedings of the IEEE International Conference on Acoustics, 2021

Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Dual Dropout Ranking of Linguistic Features for Alzheimer's Disease Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
An Integrated Approach of Machine Learning and Systems Thinking for Waiting Time Prediction in an Emergency Department.
Int. J. Medical Informatics, 2020

Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network.
CoRR, 2020

Neural Architecture Search for Speech Recognition.
CoRR, 2020

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech.
CoRR, 2020

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification.
Proceedings of the Odyssey 2020: The Speaker and Language Recognition Workshop, 2020

Speaker-Aware Linear Discriminant Analysis in Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Multi-Channel Recognition of Overlapped Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech-XLNet: Unsupervised Acoustic Model Pretraining for Self-Attention Networks.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Exploiting Cross-Domain Visual Feature Generation for Disordered Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transferring Source Style in Non-Parallel Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Group Gated Fusion on Attention-Based Bidirectional Alignment for Multimodal Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Enhancing Monotonicity for Robust Autoregressive Transformer TTS.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigation of Data Augmentation Techniques for Disordered Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Audio-Visual Recognition of Overlapped Speech for the LRS2 Dataset.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Low-bit Quantization of Recurrent Neural Network Language Models Using Alternating Direction Methods of Multipliers.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Defense Against Adversarial Attacks on Spoofing Countermeasures of ASV.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Voice Conversion Via Cross-Modal Knowledge Distillation for Dysarthric Speech Reconstruction.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Accent Conversion Without Using Native Utterances.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adversarial Attacks on GMM I-Vector Based Speaker Verification Systems.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks.
CoRR, 2019

Semi-Supervised Graph Classification: A Hierarchical Graph Perspective.
Proceedings of the World Wide Web Conference, 2019

Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Methods for Audio Classification from Lecture Discussion Recordings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Exploiting Visual Features Using Bayesian Gated Neural Networks for Disordered Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On the Use of Pitch Features for Disordered Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

The CUHK Dysarthric Speech Recognition Systems for English and Cantonese.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Discriminative Representation Learning for Speech Emotion Recognition.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Recurrent Neural Network Language Model Training Using Natural Gradient.
Proceedings of the IEEE International Conference on Acoustics, 2019

Speech Emotion Recognition Using Capsule Networks.
Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

CNN-RNN-CTC Based End-to-end Mispronunciation Detection and Diagnosis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Gaussian Process Lstm Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Bayesian and Gaussian Process Neural Networks for Large Vocabulary Continuous Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Adversarial Attacks on Spoofing Countermeasures of Automatic Speaker Verification.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Query-by-Example Spoken Term Detection using Attentive Pooling Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.
Speech Commun., 2018

How can we better use Twitter to find a person who got lost due to dementia?
npj Digit. Medicine, 2018

Data Visualization with IBM Watson Analytics for Global Cancer Trends Comparison from World Health Organization.
Int. J. Heal. Inf. Syst. Informatics, 2018

The HCCL-CUHK System for the Voice Conversion Challenge 2018.
Proceedings of the Odyssey 2018: The Speaker and Language Recognition Workshop, 2018

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

TATC: Predicting Alzheimer's Disease with Actigraphy Data.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

DNN i-vector based Fishervoice and PLDA SVM scoring for NIST SRE 2016.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Development of the CUHK Dysarthric Speech Recognition System for the UA Speech Corpus.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speech and Language Processing for Learning and Wellbeing.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Unsupervised Discovery of Non-native Phonetic Patterns in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Gaussian Process Neural Networks for Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Feature Based Adaptation for Speaking Style Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Limited-Memory BFGS Optimization of Recurrent Neural Network Language Models for Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Social Media as a Tool to Look for People with Dementia Who Become Lost: Factors That Matter.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018

Drawing-Based Automatic Dementia Screening Using Gaussian Process Markov Chains.
Proceedings of the 51st Hawaii International Conference on System Sciences, 2018

Machine Learning on Drawing Behavior for Dementia Screening.
Proceedings of the 2018 International Conference on Digital Health, 2018

Topic Discovery via Convex Polytopic Model: A Case Study with Small Corpora.
Proceedings of the 9th IEEE International Conference on Cognitive Infocommunications, 2018

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

2017
Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Intonation classification for L2 English speech using multi-distribution deep neural networks.
Comput. Speech Lang., 2017

DNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A model of extended paragraph vector for document categorization and trend analysis.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Data Visualization on Global Trends on Cancer Incidence An Application of IBM Watson Analytics.
Proceedings of the 50th Hawaii International Conference on System Sciences, 2017

Personal Wearable Devices to Measure Heart Rate Variability: A Framework of Cloud Platform for Public Health Research.
Proceedings of the 2017 International Conference on Digital Health, 2017

Classification of Visit-to-Visit Blood Pressure Variability: A Machine Learning Approach for Data Clustering on Systolic Blood Pressure Intervention Trial (SPRINT).
Proceedings of the 2017 International Conference on Digital Health, 2017

Parallel probabilistic swarm guidance by exploiting Kronecker product structures in discrete-time Markov chains.
Proceedings of the 2017 American Control Conference, 2017

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
A Two-Pass Framework of Mispronunciation Detection and Diagnosis for Computer-Aided Pronunciation Training.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Capitalizing on musical rhythm for prosodic training in computer-aided language learning.
Comput. Speech Lang., 2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.
CoRR, 2016

Kronecker product approximation with multiple factor matrices via the tensor product algorithm.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

An embedding approach for context-aware collaborative recommendation and visualization.
Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, 2016

Exploratory data analysis on nuclei in cantonese dysarthric speech.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

DBLSTM-based multi-task learning for pitch transformation in voice conversion.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Analysis on Gated Recurrent Unit Based Question Detection Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Personalized, Cross-Lingual TTS Using Phonetic Posteriorgrams.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modular sensor system (MSS) for urban air pollution monitoring.
Proceedings of the 2016 IEEE SENSORS, Orlando, FL, USA, October 30 - November 3, 2016, 2016

Phonetic posteriorgrams for many-to-one voice conversion without parallel data training.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Recognizing stances in Mandarin social ideological debates with text and acoustic features.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Exploring articulatory characteristics of Cantonese dysarthric speech using distinctive features.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Question detection from acoustic features using recurrent neural network with gated recurrent unit.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Learning Track Representation and Trends for Conference Analytics.
Proceedings of the 49th Hawaii International Conference on System Sciences, 2016

Blood Pressure Monitoring on the Cloud System in Elderly Community Centres: A Data Capturing Platform for Application Research in Public Health.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016

Utilizing Real-Time Travel Information, Mobile Applications and Wearable Devices for Smart Public Transportation.
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016

2015
Introduction to the Special Section on Continuous Space and Related Methods in Natural Language Processing.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Deep Learning for Acoustic Modeling in Parametric Speech Generation: A systematic review of existing techniques and future trends.
IEEE Signal Process. Mag., 2015

A Survey of Wireless Sensor Network Based Air Pollution Monitoring Systems.
Sensors, 2015

Expressive talking avatar synthesis and animation.
Multim. Tools Appl., 2015

Acoustic to articulatory mapping with deep neural network.
Multim. Tools Appl., 2015

Generating emphatic speech with hidden Markov model for expressive speech synthesis.
Multim. Tools Appl., 2015

Preface.
J. Multimodal User Interfaces, 2015

Integrating acoustic and state-transition models for free phone recognition in L2 English speech using multi-distribution deep neural networks.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2015

Analysis of Dysarthric Speech using Distinctive Feature Recognition.
Proceedings of the 6th Workshop on Speech and Language Processing for Assistive Technologies, 2015

Improving automatic forced alignment for dysarthric speech transcription.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Development of a Cantonese dysarthric speech corpus.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

E-commu-book: an assistive technology for users with speech impairments.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Using tilt for automatic emphasis detection with Bayesian networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Topic modeling for conference analytics.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

AA spectral space warping approach to cross-lingual voice transformation in HMM-based TTS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Blood Pressure Management with Data Capturing in the Cloud among Hypertensive Patients: A Monitoring Platform for Hypertensive Patients.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A Data Capturing Platform in the Cloud for Behavioral Analysis among Smokers: An Application Platform for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

Embracing Big Data for Simulation Modelling of Emergency Department Processes and Activities.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A Real-Time Decision Support Tool for Disaster Response: A Mathematical Programming Approach.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

Indoor Air Monitoring Platform and Personal Health Reporting System: Big Data Analytics for Public Health Research.
Proceedings of the 2015 IEEE International Congress on Big Data, New York City, NY, USA, June 27, 2015

A two-pass framework of mispronunciation detection & diagnosis for computer-aided pronunciation training.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014

Head and facial gestures synthesis using PAD model for an expressive talking avatar.
Multim. Tools Appl., 2014

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014

SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis.
Proceedings of the 8th International Workshop on Semantic Evaluation, 2014

An Integration of Random Subspace Sampling and Fishervoice for Speaker Verification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Automatic speech data clustering with human perception based weighted distance.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Mispronunciation detection and diagnosis in l2 english speech using multi-distribution Deep Neural Networks.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

PLDA modeling in the fishervoice subspace for speaker verification.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Statistical parametric speech synthesis using weighted multi-distribution deep belief network.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Contrastive auto-encoder for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Learning dynamic features with neural networks for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Phonological modeling of mispronunciation gradations in L2 English speech of L1 Chinese learners.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.
CoRR, 2013

Predicting gradation of L2 English mispronunciations using crowdsourced ratings and phonological rules.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2013

Lexical stress detection for L2 English speech using deep belief networks.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Investigation of tandem deep belief network approach for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Audiovisual synthesis of exaggerated speech for corrective feedback in computer-assisted pronunciation training.
Proceedings of the IEEE International Conference on Acoustics, 2013

Clustering similar acoustic classes in the Fishervoice framework.
Proceedings of the IEEE International Conference on Acoustics, 2013

Multi-distribution deep belief network for speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2013

Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013

Predicting gradation of L2 English mispronunciations using ASR with extended recognition network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012
Farewell Editorial.
IEEE Trans. Speech Audio Process., 2012

Phoneme-level articulatory animation in pronunciation training.
Speech Commun., 2012

Predicting User Satisfaction in Spoken Dialog System Evaluation With Collaborative Filtering.
IEEE J. Sel. Top. Signal Process., 2012

Welcome message from the conference chair.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

mENUNCIATE: Development of a computer-aided pronunciation training system on a cross-platform framework for mobile, speech-enabled application development.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Perceptually-motivated assessment of automatically detected lexical stress in L2 learners' speech.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Analysis on mispronunciations in CAPT based on computational speech perception.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Modeling the correlation between modality semantics and facial expressions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011
On Mispronunciation Lexicon Generation Using Joint-Sequence Multigrams in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Prominence Model for Prosodic Features in Automatic Lexical Stress and Pitch Accent Detection.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

An Analysis Framework Based on Random Subspace Sampling for Speaker Verification.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Design and Collection of an L2 English Corpus with a Suprasegmental Focus for Chinese Learners of English.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Allophonic variations in visual speech synthesis for corrective feedback in CAPT.
Proceedings of the IEEE International Conference on Acoustics, 2011

The HKCUPU system for the NIST 2010 speaker recognition evaluation.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Pseudo-Conventional N-Gram Representation of the Discriminative N-Gram Model for LVCSR.
IEEE J. Sel. Top. Signal Process., 2010

Introduction to the Issue on Statistical Learning Methods for Speech and Language Processing.
IEEE J. Sel. Top. Signal Process., 2010

Using finite state machines for evaluating spoken dialog systems.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Collaborative filtering model for user satisfaction prediction in Spoken Dialog System evaluation.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Collection of user judgments on spoken dialog system with crowdsourcing.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Predicting user evaluations of spoken dialog systems using semi-supervised learning.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions.
Proceedings of the 15th International Conference on Intelligent User Interfaces, 2010

Modeling prosody patterns for Chinese expressive text-to-speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Development of an articulatory visual-speech synthesizer to support language learning.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Capturing L2 segmental mispronunciations with joint-sequence models in Computer-Aided Pronunciation Training (CAPT).
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Detection of intonation in L2 English speech of native Mandarin learners.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

An enhanced Fishervoice subspace framework for text-independent speaker verification.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT).
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Automatic derivation of phonological rules for mispronunciation detection in a computer-assisted pronunciation training system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Statistical phone duration modeling to filter for intact utterances in a computer-assisted pronunciation training system.
Proceedings of the IEEE International Conference on Acoustics, 2010

Fishervioce: A discriminant subspace framework for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.
Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010

2009
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.
IEEE Trans. Speech Audio Process., 2009

Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions.
IEEE Trans. Speech Audio Process., 2009

Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training.
Proceedings of the ISCA International Workshop on Speech and Language Technology in Education, 2009

Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English.
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, 2009

Studying L2 suprasegmental features in asian Englishes: a position paper.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Audiovisual Tools for Phonetic and Articulatory Visualization in Computer-Aided Pronunciation Training.
Proceedings of the Development of Multimodal Interfaces: Active Listening and Synchrony, 2009

Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features.
Proceedings of the ACL 2009, 2009

2008
The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Decision Fusion for Improving Mispronunciation Detection Using Language Transfer Knowledge and Phoneme-Dependent Pronunciation Scoring.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A New Prosodic Strength Calculation Method for Prosody Reduction Modeling.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Automatic generation and pruning of phonetic mispronunciations to support computer-aided pronunciation training.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Recasting the discriminative n-gram model as a pseudo-conventional n-gram model for LVCSR.
Proceedings of the IEEE International Conference on Acoustics, 2008

2007
Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling.
IEEE Trans. Computers, 2007

Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

High-level feature-based speaker verification via articulatory phonetic-class pronunciation modeling.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Complementarity and redundancy in multimodal user inputs with speech and pen gestures.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Modeling the statistical behavior of lexical chains to capture word cohesiveness for automatic story segmentation.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.
Proceedings of the IEEE International Conference on Acoustics, 2007

Effects of Device Mismatch, Language Mismatch and Environmental Mismatch on Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2007

Adaptive Weight Estimation in Multi-Biometric Verification using Fuzzy Logic Decision Fusion.
Proceedings of the IEEE International Conference on Acoustics, 2007

Discriminant Mutual Subspace Learning for Indoor and Outdoor Face Recognition.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Deriving salient learners' mispronunciations from cross-language phonological comparisons.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.
Proceedings of the Affective Computing and Intelligent Interaction, 2007

2006
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2006

A Cantonese Speech-Driven Talking Face Using Translingual Audio-to-Visual Conversion.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A Corpus-Based Approach for Cooperative Response Generation in a Dialog System.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Initial Experiments on Automatic Story Segmentation in Chinese Spoken Documents Using Lexical Cohesion of Extracted Named Entities.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

A multi-pass error detection and correction framework for Mandarin LVCSR.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Joint interpretation of input speech and pen gestures for multimodal human-computer interaction.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Multi-level Fusion of Audio and Visual Features for Speaker Identification.
Proceedings of the Advances in Biometrics, International Conference, 2006

A Comparative Study of Discriminative Methods for Reranking LVCSR N-Best Hypotheses in Domain Adaptation and Generalization.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance.
Proceedings of the HLT/EMNLP 2005, 2005

Embedded Cantonese TTS for multi-device access to web content.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

2004
ISIS: an adaptive, trilingual conversational system with interleaving interaction and delegation dialogs.
ACM Trans. Comput. Hum. Interact., 2004

Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News.
Int. J. Speech Technol., 2004

Mandarin-English Information (MEI): investigating translingual speech retrieval.
Comput. Speech Lang., 2004

Error identification for large vocabulary speech recognition.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Bilingual response generation using semi-automatically-induced templates for a mixed-initiative dialog system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Prosody and style controls in CU VOCAL using SSML and SAPI XML tags.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Detection of language boundary in code-switching utterances by bi-phone probabilities.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A two-level schema for detecting recognition errors.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Fuzzy logic decision fusion in a multimodal biometric system.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A Pruning Approach for GMM-Based Speaker Verification in Mobile Embedded Systems.
Proceedings of the Biometric Authentication, First International Conference, 2004

A real-time Cantonese text-to-audiovisual speech synthesizer.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Bilingual Chinese/English voice browsing based on a VoiceXML platform.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

English-Chinese bilingual text-independent speaker verification.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Using Verb Dependency Matching in a Reading Comprehension System.
Proceedings of the Information Retrieval Technology, Asia Information Retrieval Symposium, 2004

2003
The use of belief networks for mixed-initiative dialog modeling.
IEEE Trans. Speech Audio Process., 2003

Cross-language spoken document retrieval using HMM-based retrieval model with multi-scale fusion.
ACM Trans. Asian Lang. Inf. Process., 2003

CU VOCAL Web Service: A Text-to-speech Synthesis Web Service for Voice-enabled Web-mediated Applications.
Proceedings of the Twelfth International World Wide Web Conference - Posters, 2003

Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Natural language response generation in mixed-initiative dialogs using task goals and dialog acts.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Recent enhancements in CU VOCAL for Chinese TTS-enabled applications.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multi-scale document expansion in English-Mandarin cross-language spoken document retrieval.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Semiautomatic Acquisition of Semantic Structures for Understanding Domain-Specific Natural Language Queries.
IEEE Trans. Knowl. Data Eng., 2002

A system for spoken query information retrieval on mobile devices.
IEEE Trans. Speech Audio Process., 2002

GLR parsing with multiple grammars for natural language queries.
ACM Trans. Asian Lang. Inf. Process., 2002

Spoken language resources for Cantonese speech processing.
Speech Commun., 2002

Intelligent speech for information systems: towards biliteracy and trilingualism.
Interact. Comput., 2002

Improvements on a belief network framework for natural language understanding of domain-specific Chinese queries.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Intelligent speech for information systems (ISIS): a multi-modal, trilingual, distributed conversational system with combined interaction and delegation dialogs.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

The effect of tonal context on cantonese concatenative speech synthesis.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

ISIS: a multi-modal, trilingual, distributed spoken dialog system developed with CORBA, java, XML and KQML.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Multi-scale and multi-model integration for improved performance in Chinese spoken document retrieval.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001
A hierarchical lexical representation for bi-directional spelling-to-pronunciation/pronunciation-to-spelling generation.
Speech Commun., 2001

Using contextual analysis for news event detection.
Int. J. Intell. Syst., 2001

Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks.
Proceedings of the 14th Conference on Computational Linguistics and Speech Processing, 2001

Learning Strategies In A Grammar Induction Framework.
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium, 2001

Scalability and Portability of a Belief Network-based Dialog Model for Different Application Domains.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Mandarin-English Information: Investigating Translingual Speech Retrieval.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Automatic event generation from multi-lingual news stories.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2001

Automatic Grammar Partitioning for Syntactic Parsing.
Proceedings of the Seventh International Workshop on Parsing Technologies (IWPT-2001), 2001

Multi-parser architecture for query processing.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Semi-automatic grammar induction for bi-directional English-Chinese machine translation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

ISIS: a learning system with combined interaction and delegation dialogs.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Multi-scale-audio indexing for translingual spoken document retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2001

A dynamic semantic model for re-scoring recognition hypotheses.
Proceedings of the IEEE International Conference on Acoustics, 2001

Speech retrieval with video parsing for television news programs.
Proceedings of the IEEE International Conference on Acoustics, 2001

Automatic Story Segmentation for Spoken Document Retrieval.
Proceedings of the 10th IEEE International Conference on Fuzzy Systems, 2001

2000
HCI and the 3C convergence.
ACM SIGCHI Bull., 2000

Initial Development Towards a Trilingual Speech Interface for Financial Information Inquiries.
Int. J. Speech Technol., 2000

Parsing a Lattice with Multiple Grammars.
Proceedings of the Sixth Internatonal Workshop on Parsing Technologies, 2000

Comprehension Across Application Domains and Languages.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

Sub-Syllabic Acoustic Modeling Across Chinese Dialects.
Proceedings of the 2000 International Symposium on Chinese Spoken Language Processing, 2000

Query expansion using phonetic confusions for Chinese spoken document retrieval.
Proceedings of the Fifth International Workshop on Information Retrieval with Asian Languages, 2000, Hong Kong, China, September 30, 2000

Multi-scale audio indexing for Chinese spoken document retrieval.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

ISIS: A multilingual spoken dialog system developed with CORBA and KQML agents.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Grammar partitioning and parser composition for natural language understanding.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

CU FOREX: a bilingual spoken dialog system for foreign exchange enquiries.
Proceedings of the IEEE International Conference on Acoustics, 2000

Concatenating syllables for response generation in spoken language applications.
Proceedings of the IEEE International Conference on Acoustics, 2000

Intelligent speech for information systems: towards biliteracy and trilingualism.
Proceedings of the Proceedings on the 2000 conference on Universal Usability, 2000

1999
An Analytical Study of Transformational Tagging for Chinese Text.
Proceedings of the 12th Research on Computational Linguistics Conference, 1999

Semi-automatic acquisition of domain-specific semantic structures.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

To believe is to understand.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Micro-prosodic control in cantonese text-to-speech synthesis.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

1997
From interface to content: translingual access and delivery of on-line information.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

YINHE: a Mandarin Chinese version of the GALAXY system.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

1996
Reversible letter-to-sound/sound-to-letter generation based on parsing word morpology.
Speech Commun., 1996

Multilingual human-computer interactions: from information access to language learning.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

ANGIE: a new framework for speech analysis based on morpho-phonological modelling.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

WHEELS: a conversational system in the automobile classifieds domain.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A form-based dialogue manager for spoken language applications.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

1995
Phonological parsing for bi-directional letter-to-sound/sound-to-letter generation.
PhD thesis, 1995

1994
Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation.
Proceedings of the Human Language Technology, 1994

Phonological parsing for reversible letter-to-sound/sound-to-letter generation.
Proceedings of ICASSP '94: IEEE International Conference on Acoustics, 1994

1993
Reversible letter-to-sound sound-to-letter generation based on parsing word morphology.
Proceedings of the Third European Conference on Speech Communication and Technology, 1993

1992
Language modelling for recognition and understanding using layered bigrams.
Proceedings of the Second International Conference on Spoken Language Processing, 1992

1991
Signal Representation Attribute Extraction and the Use Distinctive Features for Phonetic Classification.
Proceedings of the Speech and Natural Language, 1991

Signal representation comparison for phonetic classification.
Proceedings of the 1991 International Conference on Acoustics, 1991

1990
A comparative study of acoustic representations of speech for vowel classification using multi-layer perceptrons.
Proceedings of the First International Conference on Spoken Language Processing, 1990


  Loading...