Zhiyong Wu
Orcid: 0000-0001-8533-0524Affiliations:
- Tsinghua University, Joint Research Center for Media Sciences, Beijing, China (PhD)
- Chinese University of Hong Kong, Hong Kong
According to our database1,
Zhiyong Wu
authored at least 242 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on linkedin.com
-
on orcid.org
On csauthors.net:
Bibliography
2024
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions.
CoRR, 2024
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis.
CoRR, 2024
RobustSVC: HuBERT-based Melody Extractor and Adversarial Learning for Robust Singing Voice Conversion.
CoRR, 2024
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement.
CoRR, 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models.
CoRR, 2024
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction.
CoRR, 2024
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
Robust Representation Learning for Multimodal Emotion Recognition with Contrastive Learning and Mixup.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
The THU-HCSI Multi-Speaker Multi-Lingual Few-Shot Voice Cloning System for LIMMITS'24 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024
FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness.
Proceedings of the IEEE International Conference on Acoustics, 2024
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2024
Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2024
Multi-View Midivae: Fusing Track- and Bar-View Representations for Long Multi-Track Symbolic Music Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts.
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information.
Proceedings of the IEEE International Conference on Acoustics, 2024
Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction.
Proceedings of the IEEE International Conference on Acoustics, 2024
Collaboration of Digital Human Gestures and Teaching Materials for Enhanced Integration in MOOC Teaching Scenarios.
Proceedings of the HCI International 2024 Posters, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
MSStyleTTS: Multi-Scale Style Modeling With Hierarchical Context Information for Expressive Speech Synthesis.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios.
IEEE Signal Process. Lett., 2023
AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation.
CoRR, 2023
CoRR, 2023
CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
CoRR, 2023
Joint Multi-scale Cross-lingual Speaking Style Transfer with Bidirectional Attention Mechanism for Automatic Dubbing.
CoRR, 2023
First-order Multi-label Learning with Cross-modal Interactions for Multimodal Emotion Recognition.
Proceedings of the 1st International Workshop on Multimodal and Responsible Affective Computing, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
SpeechTripleNet: End-to-End Disentangled Speech Representation Learning for Content, Timbre and Prosody.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
MC-SpEx: Towards Effective Speaker Extraction with Multi-Scale Interfusion and Conditional Speaker Modulation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the 25th International Conference on Multimodal Interaction, 2023
Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation-based Voice Conversion.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
GTN-Bailando: Genre Consistent long-Term 3D Dance Generation Based on Pre-Trained Genre Token Network.
Proceedings of the IEEE International Conference on Acoustics, 2023
Enhancing the Vocal Range of Single-Speaker Singing Voice Synthesis with Melody-Unsupervised Pre-Training.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
TrimTail: Low-Latency Streaming ASR with Simple But Effective Spectrogram-Level Length Penalty.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Context-Aware Coherent Speaking Style Prediction with Hierarchical Transformers for Audiobook Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE.
CoRR, 2022
Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives.
CoRR, 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion.
CoRR, 2022
Disentangled Speech Representation Learning for One-Shot Cross-Lingual Voice Conversion Using ß-VAE.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
CALM: Constrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the International Joint Conference on Neural Networks, 2022
Proceedings of the International Conference on Multimodal Interaction, 2022
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022
Disentangling Content and Fine-Grained Prosody Information Via Hybrid ASR Bottleneck Features for Voice Conversion.
Proceedings of the IEEE International Conference on Acoustics, 2022
An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Neufa: Neural Network Based End-to-End Forced Alignment with Bidirectional Attention Mechanism.
Proceedings of the IEEE International Conference on Acoustics, 2022
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-Based Multi-Modal Context Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2022
An End-to-End Chinese Text Normalization Model Based on Rule-Guided Flat-Lattice Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2022
FullSubNet+: Channel Attention Fullsubnet with Complex Spectrograms for Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis.
Proceedings of the 29th International Conference on Computational Linguistics, 2022
2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Spoken Style Learning with Multi-modal Hierarchical Context Encoding for Conversational Text-to-Speech Synthesis.
CoRR, 2021
Dependency Parsing based Semantic Representation Learning with Graph Neural Network for Enhancing Expressiveness of Text-to-Speech.
CoRR, 2021
Adversarially learning disentangled speech representations for robust multi-factor voice conversion.
CoRR, 2021
Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Unsupervised Cross-Lingual Speech Emotion Recognition Using Domain Adversarial Neural Network.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Adversarially Learning Disentangled Speech Representations for Robust Multi-Factor Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
VAENAR-TTS: Variational Auto-Encoder Based Non-AutoRegressive Text-to-Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models.
Proceedings of the IEEE International Conference on Acoustics, 2021
The Huya Multi-Speaker and Multi-Style Speech Synthesis System for M2voc Challenge 2020.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving Pronunciation Assessment Via Ordinal Regression with Anchored Reference Samples.
Proceedings of the IEEE International Conference on Acoustics, 2021
Syntactic Representation Learning For Neural Network Based TTS with Syntactic Parse Tree Traversal.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Emotion Controllable Speech Synthesis Using Emotion-Unlabeled Dataset with the Assistance of Cross-Domain Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Learning from Multiple Noisy Augmented Data Sets for Better Cross-Lingual Spoken Language Understanding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback.
Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021
Reconstructing Dual Learning for Neural Voice Conversion Using Relatively Few Samples.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Unsupervised Cross-Lingual Speech Emotion Recognition Using DomainAdversarial Neural Network.
CoRR, 2020
Noise Robust TTS for Low Resource Speakers using Pre-trained Model and Speech Enhancement.
CoRR, 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
FERNet: Fine-grained Extraction and Reasoning Network for Emotion Recognition in Dialogues.
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, 2020
Channel-Wise Dense Connection Graph Convolutional Network for Skeleton-Based Action Recognition.
Proceedings of the 25th International Conference on Pattern Recognition, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Code-Switched Speech Synthesis Using Bilingual Phonetic Posteriorgram with Only Monolingual Corpora.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
2019
CoRR, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network.
Proceedings of the International Conference on Multimodal Interaction, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Quasi-fully Convolutional Neural Network with Variational Inference for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019
2018
Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks.
Speech Commun., 2018
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Rapid Style Adaptation Using Residual Error Embedding for Expressive Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Integrating Articulatory Features into Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Unsupervised Discovery of an Extended Phoneme Set in L2 English Speech for Mispronunciation Detection and Diagnosis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Learning Frame-Level Recurrent Neural Networks Representations for Query-by-Example Spoken Term Detection on Mobile Devices.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018
Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018
2017
Proceedings of the MultiMedia Modeling - 23rd International Conference, 2017
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
CoRR, 2016
Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016
Proceedings of the Advances in Multimedia Information Processing - PCM 2016, 2016
Proceedings of the International Symposium on Intelligent Signal Processing and Communication Systems, 2016
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Recognizing stances in Mandarin social ideological debates with text and acoustic features.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016
Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Question detection from acoustic features using recurrent neural network with gated recurrent unit.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Multim. Tools Appl., 2015
Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, MM '15, Brisbane, Australia, October 26, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Modelling High-Dimensional Sequences with LSTM-RTRBM: Application to Polyphonic Music Generation.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014
Multim. Tools Appl., 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Multi-channel speech enhancement using sparse coding on local time-frequency structures.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.
CoRR, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 20th International Conference on Pattern Recognition, 2010
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.
Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010
2009
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.
IEEE Trans. Speech Audio Process., 2009
2008
The Use of Dynamic Deformable Templates for Lip Tracking in an Audio-Visual Corpus with Large Variations in Head Pose, Face Illumination and Lip Shapes.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
2007
Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.
Proceedings of the IEEE International Conference on Acoustics, 2007
Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.
Proceedings of the Affective Computing and Intelligent Interaction, 2007
2006
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Proceedings of the Advances in Biometrics, International Conference, 2006
2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000