Jia Jia

Orcid: 0009-0005-8449-278X

  • Tsinghua University, Graduate School at Shenzhen, Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, China
  • Tsinghua University, Department of Computer Science and Technology, TNList, Beijing, China (PhD 2008)

According to our database1, Jia Jia authored at least 174 papers between 2005 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding.
CoRR, 2024

Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

PlacidDreamer: Advancing Harmony in Text-to-3D Generation.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Inner Classifier-Free Guidance and Its Taylor Expansion for Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SoulSkipper: A Voice-Controlled Emotional Adaptive Game to Complement Therapy for Social Anxiety Disorder.
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024

Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos.
CoRR, 2023

A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis.
CoRR, 2023

MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation.
CoRR, 2023

Semantics2Hands: Transferring Hand Motion Semantics between Avatars.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Prosody Modeling with 3D Visual Information for Expressive Video Dubbing.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation.
Proceedings of the International Conference on Machine Learning, 2023

Salient Co-Speech Gesture Synthesizing with Discrete Motion Representation.
Proceedings of the IEEE International Conference on Acoustics, 2023

MSNet: A Deep Architecture Using Multi-Sentiment Semantics for Sentiment-Aware Image Style Transfer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Shuffled Autoregression for Motion Interpolation.
Proceedings of the IEEE International Conference on Acoustics, 2023

CatHill: Emotion-Based Interactive Storytelling Game as a Digital Mental Health Intervention.
Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023

What Does Your Face Sound Like? 3D Face Shape towards Voice.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Human motion modeling with deep learning: A survey.
AI Open, January, 2022

AI Carpet: Automatic Generation of Aesthetic Carpet Pattern.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

GroupDancer: Music to Multi-People Dance Synthesis with Style Collaboration.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Towards Cross-speaker Reading Style Transfer on Audiobook Dataset.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speaker Characteristics Guided Speech Synthesis.
Proceedings of the International Joint Conference on Neural Networks, 2022

Learning from Designers: Fashion Compatibility Analysis Via Dataset Distillation.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis.
CoRR, 2021

Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Imitating Arbitrary Talking Style for Realistic Audio-Driven Talking Face Synthesis.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Towards Multi-Scale Style Control for Expressive Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Wander: A breath-control Audio Game to Support Sound Sleep.
Proceedings of the CHI PLAY '21: The Annual Symposium on Computer-Human Interaction in Play, 2021

PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback.
Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021

Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Corrective feedback, emphatic speech synthesis, visual-speech exaggeration, pronunciation learning.
CoRR, 2020

Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020

ChoreoNet: Towards Music to Dance Synthesis with Choreographic Action Unit.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Aesthetic-Aware Image Style Transfer.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Visual-speech Synthesis of Exaggerated Corrective Feedback.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Enhancing Music Recommendation with Social Media Content: an Attentive Multimodal Autoencoder Approach.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

Cross-VAE: Towards Disentangling Expression from Identity For Human Faces.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Online Intelligent Music Recommendation: The Opportunity and Challenge for People Well-Being Improvement.
Proceedings of the 2nd IEEE International Conference on Cognitive Machine Intelligence, 2020

Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

PEIA: Personality and Emotion Integrated Attentive Model for Music Recommendation on Social Media Platforms.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Inferring Emotions From Large-Scale Internet Voice Data.
IEEE Trans. Multim., 2019

Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling.
CoRR, 2019

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding.
CoRR, 2019

Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional Modeling.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

One-Shot Voice Conversion with Global Speaker Embeddings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Online Attention-Based Model for Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Towards Discriminative Representation Learning for Speech Emotion Recognition.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network.
Proceedings of the International Conference on Multimodal Interaction, 2019

Modality Attention for End-to-end Audio-visual Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Emotional Design for Children's Electronic Picture Book.
Proceedings of the Human-Computer Interaction. Perspectives on Design, 2019

Query-by-Example Spoken Term Detection using Attentive Pooling Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Exploring RNN-Transducer for Chinese speech recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Analyzing and Predicting Emoji Usages in Social Media.
Proceedings of the Companion of the The Web Conference 2018 on The Web Conference 2018, 2018

AI Painting: An Aesthetic Painting Generation System.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

AniDance: Real-Time Dance Motion Synthesize to the Song.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Dance with Melody: An LSTM-autoencoder Approach to Music-oriented Dance Synthesis.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

MAHCI 2018: The 1st Workshop on Multimedia for Accessible Human Computer Interface.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

IcooBook: When the Picture Book for Children Encounters Aesthetics of Interaction.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

The Sogou-TIIC Speech Translation System for IWSLT 2018.
Proceedings of the 15th International Conference on Spoken Language Translation, 2018

Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Speech Super-Resolution Using Parallel WaveNet.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Cross-Domain Depression Detection via Harvesting Social Media.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Mental Health Computing via Harvesting Social Media Data.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Inferring Emotions from Image Social Networks Using Group-Based Factor Graph Model.
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018

Understanding The Aesthetic Styles of Social Images.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Lookine: Let the Blind Hear a Smile.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

Trip Outfits Advisor: Location-Oriented Clothing Recommendation.
IEEE Trans. Multim., 2017

Inferring Emotional Tags From Social Images With User Demographics.
IEEE Trans. Multim., 2017

Mobile Contextual Recommender System for Online Social Media.
IEEE Trans. Mob. Comput., 2017

Detecting Stress Based on Social Interactions in Social Networks.
IEEE Trans. Knowl. Data Eng., 2017

Analyzing and Identifying Teens' Stressful Periods and Stressor Events From a Microblog.
IEEE J. Biomed. Health Informatics, 2017

Multi-scale Context Based Attention for Dynamic Music Emotion Prediction.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

PIC2DISH: A Customized Cooking Assistant System.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Inferring emotions from heterogeneous social media data: A Cross-media Auto-Encoder solution.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A systematic approach to compute perceptual distribution of monosyllables.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

SenseRun: Real-Time Running Routes Recommendation towards Providing Pleasant Running Experiences.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

A Virtual Personal Fashion Consultant: Learning from the Personal Preference of Fashion.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

AniDraw: When Music and Dance Meet Harmoniously.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Learning robust uniform features for cross-media social data by using cross autoencoders.
Knowl. Based Syst., 2016

A systematic exploration of the micro-blog feature space for teens stress detection.
Health Inf. Sci. Syst., 2016

Study on Feature Subspace of Archetypal Emotions for Speech Emotion Recognition.
CoRR, 2016

Analysis of Teens' Chronic Stress on Micro-blog.
Proceedings of the Web Information Systems Engineering - WISE 2016, 2016

Affective Contextual Mobile Recommender System.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Magic Mirror: A Virtual Fashion Consultant.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

THear: Development of a mobile multimodal audiometry application on a cross-platform framework.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phoneme Embedding and its Application to Speech Driven Talking Avatar Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

What Does Social Media Say about Your Stress?.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Inferring users' emotions for human-mobile voice dialogue applications.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

Social Role-Aware Emotion Contagion in Image Social Networks.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Representation Learning of Knowledge Graphs with Entity Descriptions.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Moodee: An Intelligent Mobile Companion for Sensing Your Stress from Your Social Media Postings.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Learning to Appreciate the Aesthetic Effects of Clothing.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

Modeling Emotion Influence in Image Social Networks.
IEEE Trans. Affect. Comput., 2015

Expressive talking avatar synthesis and animation.
Multim. Tools Appl., 2015

Generating emphatic speech with hidden Markov model for expressive speech synthesis.
Multim. Tools Appl., 2015

Using tilt for automatic emphasis detection with Bayesian networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Teenagers' Stress Detection Based on Time-Sensitive Micro-blog Comment/Response Actions.
Proceedings of the Artificial Intelligence in Theory and Practice IV, 2015

Release Adolescent Stress by Virtual Chatting.
Proceedings of the Engineering the Web in the Big Data Era - 15th International Conference, 2015

MPHA: A Personal Hearing Doctor Based on Mobile Devices.
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015

Understanding the emotions behind social images: Inferring with user demographics.
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015

HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

TeenChat: A Chatterbot System for Sensing and Releasing Adolescents' Stress.
Proceedings of the Health Information Science - 4th International Conference, 2015

Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014

Head and facial gestures synthesis using PAD model for an expressive talking avatar.
Multim. Tools Appl., 2014

Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014

Modeling Emotion Influence from Images in Social Networks.
CoRR, 2014

A computational cognition model of perception, memory, and judgment.
Sci. China Inf. Sci., 2014

Inferring Emotions from Social Images Leveraging Influence Analysis.
Proceedings of the Social Media Processing - Third National Conference, 2014

Learning to Infer Public Emotions from Large-Scale Networked Voice Data.
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014

User-level psychological stress detection from social media using deep neural network.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014

Automatic speech data clustering with human perception based weighted distance.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Algorithm of pure tone audiometry based on multiple judgment.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Psychological stress detection from cross-media microblog data using Deep Sparse Neural Network.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Helping Teenagers Relieve Psychological Pressures: A Micro-blog Based System.
Proceedings of the 17th International Conference on Extending Database Technology, 2014

How Do Your Friends on Social Media Disclose Your Emotions?
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

Affective image adjustment with a single word.
Vis. Comput., 2013

WeCard: a multimodal solution for making personalized electronic greeting cards.
Proceedings of the ACM Multimedia Conference, 2013

SNR estimation for clipped audio based on amplitude distribution.
Proceedings of the Ninth International Conference on Natural Computation, 2013

Interpretable aesthetic features for affective image classification.
Proceedings of the IEEE International Conference on Image Processing, 2013

TalkingAndroid: An interactive, multimodal and real-time talking avatar application on mobile phones.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

A new method for the objective perceptual measurement of Chinese initials.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Affective Image Colorization.
J. Comput. Sci. Technol., 2012

Comparison of adaptation methods for GMM-SVM based speech emotion recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Understanding the emotional impact of images.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Can we understand van gogh's mood?: learning to infer affects from images in social networks.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

A real-time tone enhancement method for continuous Mandarin speeches.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Analysis on mispronunciations in CAPT based on computational speech perception.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Intention understanding based on multi-source information integration for Chinese Mandarin spoken commands.
Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012

Image Colorization with an Affective Word.
Proceedings of the Computational Visual Media - First International Conference, 2012

Modeling the correlation between modality semantics and facial expressions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012

Emotional Audio-Visual Speech Synthesis Based on PAD.
IEEE Trans. Speech Audio Process., 2011

Emotional talking agent: System and evaluation.
Proceedings of the Sixth International Conference on Natural Computation, 2010

Facial expression synthesis based on motion patterns learned from face database.
Proceedings of the International Conference on Image Processing, 2010

Analysis and Modeling of Affective Audio Visual Speech Based on PAD Emotion Space.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Fingerprint matching based on weighting method and the SVM.
Neurocomputing, 2007

Fake Finger Detection Based on Time-Series Fingerprint Image Analysis.
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2007

A New Approach to Fake Finger Detection Based on Skin Elasticity Analysis.
Proceedings of the Advances in Biometrics, International Conference, 2007

A TSVM-Based Minutiae Matching Approach for Fingerprint Verification.
Proceedings of the Advances in Biometric Person Authentication, 2005
