2022
FPGA-based systolic deconvolution architecture for upsampling.
PeerJ Comput. Sci., 2022
Improving Information Literacy of Engineering Doctorate Based on Team Role Model.
Proceedings of the Computer Science and Education - 17th International Conference, 2022
2018
Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Applying Multitask Learning to Acoustic-Phonemic Model for Mispronunciation Detection and Diagnosis in L2 English Speech.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Multi-modal Multi-scale Speech Expression Evaluation in Computer-Assisted Language Learning.
Proceedings of the Artificial Intelligence and Mobile Services - AIMS 2018, 2018
2017
Multi-scale Context Based Attention for Dynamic Music Emotion Prediction.
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Multi-Task Learning for Prosodic Structure Generation Using BLSTM RNN with Structured Output Layer.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
A systematic approach to compute perceptual distribution of monosyllables.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
2016
Learning robust uniform features for cross-media social data by using cross autoencoders.
Knowl. Based Syst., 2016
Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.
CoRR, 2016
DBLSTM-based multi-task learning for pitch transformation in voice conversion.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
THear: Development of a mobile multimodal audiometry application on a cross-platform framework.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Analysis on Gated Recurrent Unit Based Question Detection Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Combining CNN and BLSTM to Extract Textual and Acoustic Features for Recognizing Stances in Mandarin Ideological Debate Competition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Heterogeneity-entropy based unsupervised feature learning for personality prediction with cross-media data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Recognizing stances in Mandarin social ideological debates with text and acoustic features.
Proceedings of the 2016 IEEE International Conference on Multimedia & Expo Workshops, 2016
DBLSTM-based multi-scale fusion for dynamic emotion prediction in music.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Learning cross-lingual information with multilingual BLSTM for speech synthesis of low-resource languages.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
SVR based double-scale regression for dynamic emotion prediction in music.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Question detection from acoustic features using recurrent neural network with gated recurrent unit.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
2015
Modeling Emotion Influence in Image Social Networks.
IEEE Trans. Affect. Comput., 2015
Generating emphatic speech with hidden Markov model for expressive speech synthesis.
Multim. Tools Appl., 2015
Using tilt for automatic emphasis detection with Bayesian networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
MPHA: A Personal Hearing Doctor Based on Mobile Devices.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015
HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
A deep recurrent approach for acoustic-to-articulatory inversion.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014
Head and facial gestures synthesis using PAD model for an expressive talking avatar.
Multim. Tools Appl., 2014
Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014
Modeling Emotion Influence from Images in Social Networks.
CoRR, 2014
A computational cognition model of perception, memory, and judgment.
Sci. China Inf. Sci., 2014
Inferring Emotions from Social Images Leveraging Influence Analysis.
Proceedings of the Social Media Processing - Third National Conference, 2014
Learning to Infer Public Emotions from Large-Scale Networked Voice Data.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014
User-level psychological stress detection from social media using deep neural network.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014
Automatic speech data clustering with human perception based weighted distance.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Algorithm of pure tone audiometry based on multiple judgment.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Improved keyword spotting system by optimizing posterior confidence measure vector using feed-forward neural network.
Proceedings of the 2014 International Joint Conference on Neural Networks, 2014
Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
Psychological stress detection from cross-media microblog data using Deep Sparse Neural Network.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
Contrastive auto-encoder for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014
Learning dynamic features with neural networks for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014
Automatic Emotion Variation Detection in continuous speech.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2014
2013
Affective image adjustment with a single word.
Vis. Comput., 2013
Feature Learning with Gaussian Restricted Boltzmann Machine for Robust Speech Recognition.
CoRR, 2013
WeCard: a multimodal solution for making personalized electronic greeting cards.
Proceedings of the ACM Multimedia Conference, 2013
SNR estimation for clipped audio based on amplitude distribution.
Proceedings of the Ninth International Conference on Natural Computation, 2013
Interpretable aesthetic features for affective image classification.
Proceedings of the IEEE International Conference on Image Processing, 2013
Investigation of tandem deep belief network approach for phoneme recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013
A real-time speech driven talking avatar based on deep neural network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
TalkingAndroid: An interactive, multimodal and real-time talking avatar application on mobile phones.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Affective Image Colorization.
J. Comput. Sci. Technol., 2012
Comparison of adaptation methods for GMM-SVM based speech emotion recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Understanding the emotional impact of images.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Can we understand van gogh's mood?: learning to infer affects from images in social networks.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
A real-time tone enhancement method for continuous Mandarin speeches.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Detection and emphatic realization of contrastive word pairs for expressive text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Analysis on mispronunciations in CAPT based on computational speech perception.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Intention understanding based on multi-source information integration for Chinese Mandarin spoken commands.
Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012
Image Colorization with an Affective Word.
Proceedings of the Computational Visual Media - First International Conference, 2012
Modeling the correlation between modality semantics and facial expressions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
Emotional Audio-Visual Speech Synthesis Based on PAD.
IEEE Trans. Speech Audio Process., 2011
Combining Active and Semi-Supervised Learning for Homograph Disambiguation in Mandarin Text-to-Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
A Lyrics to Singing Voice Synthesis System with Variable Timbre.
Proceedings of the Applied Informatics and Communication - International Conference, 2011
2010
Modeling prosody patterns for Chinese expressive text-to-speech synthesis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Investigation of the relation between acoustic features and articulation - An application to emotional speech analysis.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
HMM based TTS for mixed language text.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Comparison of Syllable/Phone HMM Based Mandarin TTS.
Proceedings of the 20th International Conference on Pattern Recognition, 2010
Emotional talking agent: System and evaluation.
Proceedings of the Sixth International Conference on Natural Computation, 2010
Facial expression synthesis based on motion patterns learned from face database.
Proceedings of the International Conference on Image Processing, 2010
The Intelligent Music Editor: Towards an Automated Platform for Music Analysis and Editing.
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence, 2010
Facial Expression Synthesis Based on Emotion Dimensions for Affective Talking Avatar.
Proceedings of the Modeling Machine Emotions for Realizing Intelligence, 2010
2009
Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System.
IEEE Trans. Speech Audio Process., 2009
Syllable HMM based Mandarin TTS and comparison with concatenative TTS.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Voiced/unvoiced decision algorithm for HMM-based speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Automatic Emphasis Labeling for Emotional Speech by Measuring Prosody Generation Error.
Proceedings of the Emerging Intelligent Computing Technology and Applications, 2009
Cultural style based music classification of audio signals.
Proceedings of the IEEE International Conference on Acoustics, 2009
2008
Clustering Music Recordings by Their Keys.
Proceedings of the ISMIR 2008, 2008
Analysis and Modeling of Affective Audio Visual Speech Based on PAD Emotion Space.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
A New Prosodic Strength Calculation Method for Prosody Reduction Modeling.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Entering Tone Recognition in a Support Vector Machine Approach.
Proceedings of the Fourth International Conference on Natural Computation, 2008
2007
Fingerprint matching based on weighting method and the SVM.
Neurocomputing, 2007
Hierarchical non-uniform unit selection based on prosodic structure.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Fake Finger Detection Based on Time-Series Fingerprint Image Analysis.
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2007
A New Approach to Fake Finger Detection Based on Skin Elasticity Analysis.
Proceedings of the Advances in Biometrics, International Conference, 2007
Head Movement Synthesis Based on Semantic and Prosodic Features for a Chinese Expressive Avatar.
Proceedings of the IEEE International Conference on Acoustics, 2007
Script Design Based on Decision Tree with Context Vector and Acoustic Distance for Mandarin TTS.
Proceedings of the IEEE International Conference on Acoustics, 2007
Facial Expression Synthesis Using PAD Emotional Parameters for a Chinese Expressive Avatar.
Proceedings of the Affective Computing and Intelligent Interaction, 2007
Affect Related Acoustic Features of Speech and Their Modification.
Proceedings of the Affective Computing and Intelligent Interaction, 2007
2006
A flexible framework for key audio effects detection and auditory context inference.
IEEE Trans. Speech Audio Process., 2006
Perceptually Weighted Mel-Cepstrum Analysis of Speech Based on Psychoacoustic Model.
IEICE Trans. Inf. Syst., 2006
Modelling the Global acoustic Correlates of Expressivity for Chinese Text-to-speech Synthesis.
Proceedings of the 2006 IEEE ACL Spoken Language Technology Workshop, 2006
Prosodic Boundary Prediction Based on Maximum Entropy Model with Error-Driven Modification.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Spectral Continuity Measures at Mandarin Syllable Boundaries.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Investigation on Pleasure Related Acoustic Features of Affective Speech.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Modeling the acoustic correlates of expressive elements in text genres for expressive text-to-speech synthesis.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006
Acoustic and Physiological Feature Analysis of Affective Speech.
Proceedings of the Computational Intelligence, 2006
Multi-level Fusion of Audio and Visual Features for Speaker Identification.
Proceedings of the Advances in Biometrics, International Conference, 2006
2005
A TSVM-Based Minutiae Matching Approach for Fingerprint Verification.
Proceedings of the Advances in Biometric Person Authentication, 2005
Grapheme-to-phoneme conversion based on TBL algorithm in Mandarin TTS system.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Prosody Analysis and Modeling for Emotional Speech Synthesis.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Unsupervised auditory scene categorization via key audio effects and information-theoretic co-clustering.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005
Grapheme-to-Phoneme Conversion Based on a Fast TBL Algorithm in Mandarin TTS Systems.
Proceedings of the Fuzzy Systems and Knowledge Discovery, Second International Conference, 2005
2004
Classifying emotion in Chinese speech by decomposing prosodic features.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Face Pose Estimation and its Application in Video Shot Selection.
Proceedings of the 17th International Conference on Pattern Recognition, 2004
Speech emotion classification with the combination of statistic features and temporal features.
Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004
Improve audio representation by using feature structure patterns.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
Approach to the Correlation Discovery of Chinese Linguistic Parameters Based on Bayesian Method.
J. Comput. Sci. Technol., 2003
An Improved Framework for Online Adaptive Information Filtering.
Proceedings of the Advances in Web-Age Information Management, 2003
An adaptive system for online document filtering.
Proceedings of the IEEE International Conference on Systems, 2003
Highlight sound effects detection in audio stream.
Proceedings of the 2003 IEEE International Conference on Multimedia and Expo, 2003
2002
Incremental Learning for Profile Training in Adaptive Document Filtering.
Proceedings of The Eleventh Text REtrieval Conference, 2002
Voice quality analysis under the pitch effect.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Annotation of Chinese prosodic level based on probabilistic model.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Automatic stress prediction of Chinese speech synthesis.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Prosodic phrasing with inductive learning.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Clustering and feature learning based F0 prediction for Chinese speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Music type classification by spectral contrast feature.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002
Learning Rules for Chinese Prosodic Phrase Prediction.
Proceedings of the First Workshop on Chinese Language Processing, 2002
2000
Research on dynamic characters of Chinese pitch contours.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
The design and application of a speech database for Chinese TTS system.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
1998
The Statistical Model of Chinese Word Contours Based on Fuzzy.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998
1987
A large-vocabulary Chinese speech recognition system.
Proceedings of the IEEE International Conference on Acoustics, 1987