Wei Li

Zhemin Zhuang

Tardi Tjahjadi

PeerJ Comput. Sci., 2022

Improving Information Literacy of Engineering Doctorate Based on Team Role Model.

[DOI]

Proceedings of the Computer Science and Education - 17th International Conference, 2022

2018

Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning.

[DOI]

Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018

2017

Multi-scale Context Based Attention for Dynamic Music Emotion Prediction.

[DOI]

Proceedings of the 2017 ACM on Multimedia Conference, 2017

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A systematic approach to compute perceptual distribution of monosyllables.

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Learning robust uniform features for cross-media social data by using cross autoencoders.

[DOI]

Knowl. Based Syst., 2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.

[DOI]

CoRR, 2016

DBLSTM-based multi-task learning for pitch transformation in voice conversion.

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

THear: Development of a mobile multimodal audiometry application on a cross-platform framework.

[DOI]

Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Analysis on Gated Recurrent Unit Based Question Detection Approach.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Recognizing stances in Mandarin social ideological debates with text and acoustic features.

[DOI]

Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016

DBLSTM-based multi-scale fusion for dynamic emotion prediction in music.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

SVR based double-scale regression for dynamic emotion prediction in music.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Question detection from acoustic features using recurrent neural network with gated recurrent unit.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Modeling Emotion Influence in Image Social Networks.

[DOI]

IEEE Trans. Affect. Comput., 2015

Generating emphatic speech with hidden Markov model for expressive speech synthesis.

[DOI]

Multim. Tools Appl., 2015

Using tilt for automatic emphasis detection with Bayesian networks.

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

MPHA: A Personal Hearing Doctor Based on Mobile Devices.

[DOI]

Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A deep recurrent approach for acoustic-to-articulatory inversion.

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.

[DOI]

Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.

[DOI]

Multim. Tools Appl., 2014

Head and facial gestures synthesis using PAD model for an expressive talking avatar.

[DOI]

Multim. Tools Appl., 2014

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.

[DOI]

J. Comput. Sci. Technol., 2014

Modeling Emotion Influence from Images in Social Networks.

[DOI]

CoRR, 2014

A computational cognition model of perception, memory, and judgment.

[DOI]

Sci. China Inf. Sci., 2014

Inferring Emotions from Social Images Leveraging Influence Analysis.

[DOI]

Proceedings of the Social Media Processing - Third National Conference, 2014

Learning to Infer Public Emotions from Large-Scale Networked Voice Data.

[DOI]

Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

User-level psychological stress detection from social media using deep neural network.

[DOI]

Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Automatic speech data clustering with human perception based weighted distance.

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Algorithm of pure tone audiometry based on multiple judgment.

[DOI]

Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Using conditional random fields to predict focus word pair in spontaneous spoken English.

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improved keyword spotting system by optimizing posterior confidence measure vector using feed-forward neural network.

[DOI]

Yuchen Liu

Mingxing Xu

Proceedings of the 2014 International Joint Conference on Neural Networks, 2014

Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Psychological stress detection from cross-media microblog data using Deep Sparse Neural Network.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Contrastive auto-encoder for phoneme recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Learning dynamic features with neural networks for phoneme recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Automatic Emotion Variation Detection in continuous speech.

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014

2013

Affective image adjustment with a single word.

[DOI]

Xiaohui Wang

Vis. Comput., 2013

Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.

[DOI]

CoRR, 2013

WeCard: a multimodal solution for making personalized electronic greeting cards.

[DOI]

Proceedings of the ACM Multimedia Conference, 2013

SNR estimation for clipped audio based on amplitude distribution.

[DOI]

Xiaoqing Liu

Proceedings of the Ninth International Conference on Natural Computation, 2013

Interpretable aesthetic features for affective image classification.

[DOI]

Proceedings of the IEEE International Conference on Image Processing, 2013

Investigation of tandem deep belief network approach for phoneme recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A real-time speech driven talking avatar based on deep neural network.

[DOI]

Kai Zhao

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

TalkingAndroid: An interactive, multimodal and real-time talking avatar application on mobile phones.

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

2012

Affective Image Colorization.

[DOI]

J. Comput. Sci. Technol., 2012

Comparison of adaptation methods for GMM-SVM based speech emotion recognition.

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Understanding the emotional impact of images.

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Can we understand van gogh's mood?: learning to infer affects from images in social networks.

[DOI]

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A real-time tone enhancement method for continuous Mandarin speeches.

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Analysis on mispronunciations in CAPT based on computational speech perception.

[DOI]

Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Intention understanding based on multi-source information integration for Chinese Mandarin spoken commands.

[DOI]

Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012

Image Colorization with an Affective Word.

[DOI]

Proceedings of the Computational Visual Media - First International Conference, 2012

Modeling the correlation between modality semantics and facial expressions.

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

2011

Emotional Audio-Visual Speech Synthesis Based on PAD.

[DOI]

IEEE Trans. Speech Audio Process., 2011

Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis.

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Lyrics to Singing Voice Synthesis System with Variable Timbre.

[DOI]

Proceedings of the Applied Informatics and Communication - International Conference, 2011

2010

Modeling prosody patterns for Chinese expressive text-to-speech synthesis.

[DOI]

Helen M. Meng

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Investigation of the relation between acoustic features and articulation - An application to emotional speech analysis.

[DOI]

Yongxin Wang

Jianwu Dang

Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

HMM based TTS for mixed language text.

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Comparison of Syllable/Phone HMM Based Mandarin TTS.

[DOI]

Proceedings of the 20th International Conference on Pattern Recognition, 2010

Emotional talking agent: System and evaluation.

[DOI]

Proceedings of the Sixth International Conference on Natural Computation, 2010

Facial expression synthesis based on motion patterns learned from face database.

[DOI]

Shen Zhang

Proceedings of the International Conference on Image Processing, 2010

The Intelligent Music Editor: Towards an Automated Platform for Music Analysis and Editing.

[DOI]

Yuxiang Liu

Roger B. Dannenberg

Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, 2010

Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.

[DOI]

Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010

2009

Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.

[DOI]

IEEE Trans. Speech Audio Process., 2009

Syllable HMM based Mandarin TTS and comparison with concatenative TTS.

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Voiced/unvoiced decision algorithm for HMM-based speech synthesis.

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Automatic Emphasis Labeling for Emotional Speech by Measuring Prosody Generation Error.

[DOI]

Jun Xu

Proceedings of the Emerging Intelligent Computing Technology and Applications, 2009

Cultural style based music classification of audio signals.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

Clustering Music Recordings by Their Keys.

[DOI]

Proceedings of the ISMIR 2008, 2008

Analysis and Modeling of Affective Audio Visual Speech Based on PAD Emotion Space.

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A New Prosodic Strength Calculation Method for Prosody Reduction Modeling.

[DOI]

Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Entering Tone Recognition in a Support Vector Machine Approach.

[DOI]

Xiangcheng Wang

Ying Liu

Proceedings of the Fourth International Conference on Natural Computation, 2008

2007

Fingerprint matching based on weighting method and the SVM.

[DOI]

Neurocomputing, 2007

Hierarchical non-uniform unit selection based on prosodic structure.

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Fake Finger Detection Based on Time-Series Fingerprint Image Analysis.

[DOI]

Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2007

A New Approach to Fake Finger Detection Based on Skin Elasticity Analysis.

[DOI]

Proceedings of the Advances in Biometrics, International Conference, 2007

Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Script Design Based on Decision Tree with Context Vector and Acoustic Distance for Mandarin TTS.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.

[DOI]

Proceedings of the Affective Computing and Intelligent Interaction, 2007

Affect Related Acoustic Features of Speech and Their Modification.

[DOI]

Proceedings of the Affective Computing and Intelligent Interaction, 2007

2006

A flexible framework for key audio effects detection and auditory context inference.

[DOI]

IEEE Trans. Speech Audio Process., 2006

Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model.

[DOI]

Hongwu Yang

Dezhi Huang

IEICE Trans. Inf. Syst., 2006

Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.

[DOI]

Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006

Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification.

[DOI]

Xiaonan Zhang

Jun Xu

Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Spectral Continuity Measures at Mandarin Syllable Boundaries.

[DOI]

Jun Xu

Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Investigation on Pleasure Related Acoustic Features of Affective Speech.

[DOI]

Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis.

[DOI]

Hongwu Yang

Helen M. Meng

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Acoustic and Physiological Feature Analysis of Affective Speech.

[DOI]

Dandan Cui

Proceedings of the Computational Intelligence, 2006

Multi-level Fusion of Audio and Visual Features for Speaker Identification.

[DOI]

Helen M. Meng

Proceedings of the Advances in Biometrics, International Conference, 2006

2005

A TSVM-Based Minutiae Matching Approach for Fingerprint Verification.

[DOI]

Proceedings of the Advances in Biometric Person Authentication, 2005

Grapheme-to-phoneme conversion based on TBL algorithm in Mandarin TTS system.

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Prosody Analysis and Modeling for Emotional Speech Synthesis.

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Unsupervised auditory scene categorization via key audio effects and information-theoretic co-clustering.

[DOI]

Rui Cai

Lie Lu

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Grapheme-to-Phoneme Conversion Based on a Fast TBL Algorithm in Mandarin TTS Systems.

[DOI]

Proceedings of the Fuzzy Systems and Knowledge Discovery, Second International Conference, 2005

2004

Classifying emotion in Chinese speech by decomposing prosodic features.

[DOI]

Dan-Ning Jiang

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Face Pose Estimation and its Application in Video Shot Selection.

[DOI]

Proceedings of the 17th International Conference on Pattern Recognition, 2004

Speech emotion classification with the combination of statistic features and temporal features.

[DOI]

Dan-Ning Jiang

Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004

Improve audio representation by using feature structure patterns.

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

2003

Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method.

[DOI]

Wei Wang

J. Comput. Sci. Technol., 2003

An Improved Framework for Online Adaptive Information Filtering.

[DOI]

Liang Ma

Qunxiu Chen

Proceedings of the Advances in Web-Age Information Management, 2003

An adaptive system for online document filtering.

[DOI]

Liang Ma

Qunxiu Chen

Proceedings of the IEEE International Conference on Systems, 2003

Highlight sound effects detection in audio stream.

[DOI]

Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003

2002

Incremental Learning for Profile Training in Adaptive Document Filtering.

[DOI]

Proceedings of The Eleventh Text REtrieval Conference, 2002

Voice quality analysis under the pitch effect.

[DOI]

Dan-Ning Jiang

Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Annotation of Chinese prosodic level based on probabilistic model.

[DOI]

Rui Cai

Zhi-Yong Wu

Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Automatic stress prediction of Chinese speech synthesis.

[DOI]

Sheng Zhao

Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Prosodic phrasing with inductive learning.

[DOI]

Sheng Zhao

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Clustering and feature learning based F0 prediction for Chinese speech synthesis.

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Music type classification by spectral contrast feature.

[DOI]

Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Learning Rules for Chinese Prosodic Phrase Prediction.

[DOI]

Sheng Zhao

Proceedings of the First Workshop on Chinese Language Processing, 2002

2000

Research on dynamic characters of Chinese pitch contours.

[DOI]

Tongchun Zhou

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

The design and application of a speech database for Chinese TTS system.

[DOI]

Muhua Lv

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998

The Statistical Model of Chinese Word Contours Based on Fuzzy.

[DOI]