Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020

Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition.

[BibT_eX]

[DOI]

Michael Lao BanTeng

Zhiyong Wu

Proceedings of the 25th International Conference on Pattern Recognition, 2020

End-To-End Accent Conversion Without Using Native Utterances.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Speech-XLNet: Unsupervised Acoustic Model Pretraining For Self-Attention Networks.

[BibT_eX]

[DOI]

CoRR, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Discriminative Representation Learning for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network.

[BibT_eX]

[DOI]

Yulan Chen

Jia Jia

Zhiyong Wu

Proceedings of the International Conference on Multimodal Interaction, 2019

Speech Emotion Recognition Using Capsule Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

NN-based Ordinal Regression for Assessing Fluency of ESL Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Code-switched TTS with Mix of Monolingual Recordings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Query-by-Example Spoken Term Detection using Attentive Pooling Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Automatic Prosodic Structure Labeling using DNN-BGRU-CRF Hybrid Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Prosodic Structure Prediction using Deep Self-attention Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

2018

Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.

[BibT_eX]

[DOI]

Speech Commun., 2018

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.

[BibT_eX]

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Speech Super-Resolution Using Parallel WaveNet.

[BibT_eX]

[DOI]

Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Feature Based Adaptation for Speaking Style Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning.

[BibT_eX]

[DOI]

Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

2017

Movie Recommendation via BLSTM.

[BibT_eX]

[DOI]

Song Tang

Zhiyong Wu

Kang Chen

Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2016

A Real-Time Gesture-Based Unmanned Aerial Vehicle Control System.

[BibT_eX]

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

Video Inpainting Based on Joint Gradient and Noise Minimization.

[BibT_eX]

[DOI]

Yiqi Jiang

Xin Jin

Zhiyong Wu

Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016

3D modeling based on multiple Unmanned Aerial Vehicles with the optimal paths.

[BibT_eX]

[DOI]

Leye Wei

Xin Jin

Zhiyong Wu

Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems, 2016

DBLSTM-based multi-task learning for pitch transformation in voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Analysis on Gated Recurrent Unit Based Question Detection Approach.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Recognizing stances in Mandarin social ideological debates with text and acoustic features.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Question detection from acoustic features using recurrent neural network with gated recurrent unit.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Acoustic to articulatory mapping with deep neural network.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2015

Generating emphatic speech with hidden Markov model for expressive speech synthesis.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2015

Polyphonic Music Modelling with LSTM-RTRBM.

[BibT_eX]

[DOI]

Qi Lyu

Zhiyong Wu

Jun Zhu

Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015

Using tilt for automatic emphasis detection with Bayesian networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2014

Head and facial gestures synthesis using PAD model for an expressive talking avatar.

[BibT_eX]

[DOI]

Multim. Tools Appl., 2014

Automatic speech data clustering with human perception based weighted distance.

[BibT_eX]

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Multi-channel speech enhancement using sparse coding on local time-frequency structures.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Using conditional random fields to predict focus word pair in spontaneous spoken English.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Contrastive auto-encoder for phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Learning dynamic features with neural networks for phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Automatic Emotion Variation Detection in continuous speech.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2013

Investigation of tandem deep belief network approach for phoneme recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A real-time speech driven talking avatar based on deep neural network.

[BibT_eX]

[DOI]

Kai Zhao

Zhiyong Wu

Lianhong Cai

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Frequency-domain dereverberation on speech signal using surround retinex.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Sparse coding for sound event classification.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Comparison of adaptation methods for GMM-SVM based speech emotion recognition.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.

[BibT_eX]

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Modeling the correlation between modality semantics and facial expressions.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

2010

Modeling prosody patterns for Chinese expressive text-to-speech synthesis.

[BibT_eX]

[DOI]

Zhiyong Wu

Lianhong Cai

Helen M. Meng

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Comparison of Syllable/Phone HMM Based Mandarin TTS.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.

[BibT_eX]

[DOI]

Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010

2009

Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

2008

The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes.

[BibT_eX]

[DOI]

Zhiyong Wu

Jiying Wu

Helen M. Meng

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A New Prosodic Strength Calculation Method for Prosody Reduction Modeling.

[BibT_eX]

[DOI]

Honglei Cong

Zhiyong Wu

Lianhong Cai