Jia Jia
Orcid: 0009-0005-8449-278XAffiliations:
- Tsinghua University, Graduate School at Shenzhen, Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems, China
- Tsinghua University, Department of Computer Science and Technology, TNList, Beijing, China (PhD 2008)
According to our database1,
Jia Jia
authored at least 174 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding.
CoRR, 2024
Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
SoulSkipper: A Voice-Controlled Emotional Adaptive Game to Complement Therapy for Social Anxiety Disorder.
Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024
2023
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos.
CoRR, 2023
CoRR, 2023
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation.
CoRR, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
AvatarFusion: Zero-shot Generation of Clothing-Decoupled 3D Avatars Using 2D Diffusion.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
SDDM: Score-Decomposed Diffusion Models on Manifolds for Unpaired Image-to-Image Translation.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
MSNet: A Deep Architecture Using Multi-Sentiment Semantics for Sentiment-Aware Image Style Transfer.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
CatHill: Emotion-Based Interactive Storytelling Game as a Digital Mental Health Intervention.
Proceedings of the Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Inferring Speaking Styles from Multi-modal Conversational Context by Multi-scale Relational Graph Convolutional Networks.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the International Joint Conference on Neural Networks, 2022
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022
2021
CoRR, 2021
Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the CHI PLAY '21: The Annual Symposium on Computer-Human Interaction in Play, 2021
PTeacher: a Computer-Aided Personalized Pronunciation Training System with Exaggerated Audio-Visual Corrective Feedback.
Proceedings of the CHI '21: CHI Conference on Human Factors in Computing Systems, 2021
Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Inferring Emotion from Large-scale Internet Voice Data: A Semi-supervised Curriculum Augmentation based Deep Learning Approach.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Corrective feedback, emphatic speech synthesis, visual-speech exaggeration, pronunciation learning.
CoRR, 2020
Inferring Emphasis for Real Voice Data: An Attentive Multimodal Neural Network Approach.
Proceedings of the MultiMedia Modeling - 26th International Conference, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
Re-Weighted Interval Loss for Handling Data Imbalance Problem of End-to-End Keyword Spotting.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Enhancing Music Recommendation with Social Media Content: an Attentive Multimodal Autoencoder Approach.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Online Intelligent Music Recommendation: The Opportunity and Challenge for People Well-Being Improvement.
Proceedings of the 2nd IEEE International Conference on Cognitive Machine Intelligence, 2020
Mining Unfollow Behavior in Large-Scale Online Social Networks via Spatial-Temporal Interaction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
PEIA: Personality and Emotion Integrated Attentive Model for Music Recommendation on Social Media Platforms.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional modelling.
CoRR, 2019
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding.
CoRR, 2019
Understanding the Teaching Styles by an Attention based Multi-task Cross-media Dimensional Modeling.
Proceedings of the 27th ACM International Conference on Multimedia, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Design and Implementation of a Disambiguity Framework for Smart Voice Controlled Devices.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Modeling Emotion Influence Using Attention-based Graph Convolutional Recurrent Network.
Proceedings of the International Conference on Multimodal Interaction, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
A Compact Framework for Voice Conversion Using Wavenet Conditioned on Phonetic Posteriorgrams.
Proceedings of the IEEE International Conference on Acoustics, 2019
Dilated Residual Network with Multi-head Self-attention for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Learning Discriminative Features from Spectrograms Using Center Loss for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Human-Computer Interaction. Perspectives on Design, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Learning Contextual Representation with Convolution Bank and Multi-head Self-attention for Speech Emphasis Detection.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
2018
Proceedings of the Companion of the The Web Conference 2018 on The Web Conference 2018, 2018
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Inferring User Emotive State Changes in Realistic Human-Computer Conversational Dialogs.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 15th International Conference on Spoken Language Translation, 2018
Emphasis Detection for Voice Dialogue Applications Using Multi-channel Convolutional Bidirectional Long Short-Term Memory Network.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Emotion Recognition from Variable-Length Speech Segments Using Deep Learning on Spectrograms.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018
Proceedings of the 2018 IEEE International Conference on Multimedia and Expo, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Emphatic Speech Generation with Conditioned Input Layer and Bidirectional LSTMS for Expressive Speech Synthesis.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Inferring Emotion from Conversational Voice Data: A Semi-Supervised Multi-Path Generative Neural Network Approach.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
IEEE Trans. Multim., 2017
IEEE Trans. Multim., 2017
IEEE Trans. Mob. Comput., 2017
IEEE Trans. Knowl. Data Eng., 2017
Analyzing and Identifying Teens' Stressful Periods and Stressor Events From a Microblog.
IEEE J. Biomed. Health Informatics, 2017
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Proceedings of the 2017 ACM on Multimedia Conference, 2017
Speech Emotion Recognition with Emotion-Pair Based Framework Considering Emotion Distribution Information in Dimensional Emotion Space.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Depression Detection via Harvesting Social Media: A Multimodal Dictionary Learning Solution.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017
Inferring emotions from heterogeneous social media data: A Cross-media Auto-Encoder solution.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Learning cross-lingual knowledge with multilingual BLSTM for emphasis detection with limited training data.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Multi-Task Deep Learning for User Intention Understanding in Speech Interaction Systems.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
SenseRun: Real-Time Running Routes Recommendation towards Providing Pleasant Running Experiences.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
A Virtual Personal Fashion Consultant: Learning from the Personal Preference of Fashion.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Learning robust uniform features for cross-media social data by using cross autoencoders.
Knowl. Based Syst., 2016
Health Inf. Sci. Syst., 2016
CoRR, 2016
Proceedings of the Web Information Systems Engineering - WISE 2016, 2016
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
THear: Development of a mobile multimodal audiometry application on a cross-platform framework.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Low level descriptors based DBLSTM bottleneck feature for speech driven talking avatar.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
Moodee: An Intelligent Mobile Companion for Sensing Your Stress from Your Social Media Postings.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016
2015
IEEE Trans. Affect. Comput., 2015
Multim. Tools Appl., 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Teenagers' Stress Detection Based on Time-Sensitive Micro-blog Comment/Response Actions.
Proceedings of the Artificial Intelligence in Theory and Practice IV, 2015
Proceedings of the Engineering the Web in the Big Data Era - 15th International Conference, 2015
Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, Seattle, WA, USA, November 09, 2015
Proceedings of the 2015 IEEE International Conference on Multimedia and Expo, 2015
HMM-based emphatic speech synthesis for corrective feedback in computer-aided pronunciation training.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the Health Information Science - 4th International Conference, 2015
Understanding speaking styles of internet speech data with LSTM and low-resource training.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training.
Multim. Tools Appl., 2014
Multim. Tools Appl., 2014
Grading the Severity of Mispronunciations in CAPT Based on Statistical Analysis and Computational Speech Perception.
J. Comput. Sci. Technol., 2014
Sci. China Inf. Sci., 2014
Proceedings of the Social Media Processing - Third National Conference, 2014
Proceedings of the MultiMedia Modeling - 20th Anniversary International Conference, 2014
User-level psychological stress detection from social media using deep neural network.
Proceedings of the ACM International Conference on Multimedia, MM '14, Orlando, FL, USA, November 03, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Using conditional random fields to predict focus word pair in spontaneous spoken English.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Acoustics, content and geo-information based sentiment prediction from large-scale networked voice data.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
Psychological stress detection from cross-media microblog data using Deep Sparse Neural Network.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014
Proceedings of the 17th International Conference on Extending Database Technology, 2014
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014
2013
Proceedings of the ACM Multimedia Conference, 2013
Proceedings of the Ninth International Conference on Natural Computation, 2013
Proceedings of the IEEE International Conference on Image Processing, 2013
TalkingAndroid: An interactive, multimodal and real-time talking avatar application on mobile phones.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
2012
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Can we understand van gogh's mood?: learning to infer affects from images in social networks.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012
Adaptive named entity recognition based on conditional random fields with automatic updated dynamic gazetteers.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Perceptual clustering based unit selection optimization for concatenative text-to-speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Hierarchical English Emphatic Speech Synthesis Based on HMM with Limited Training Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Intention understanding based on multi-source information integration for Chinese Mandarin spoken commands.
Proceedings of the 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012
Proceedings of the Computational Visual Media - First International Conference, 2012
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2012
2011
IEEE Trans. Speech Audio Process., 2011
2010
Proceedings of the Sixth International Conference on Natural Computation, 2010
Proceedings of the International Conference on Image Processing, 2010
2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
2007
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2007
Proceedings of the Advances in Biometrics, International Conference, 2007
2005
Proceedings of the Advances in Biometric Person Authentication, 2005