Jianhua Tao
Orcid: 0000-0002-9344-6428Affiliations:
- Tsinghua University, Department of Automation, Beijing, China
- University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, China
- Tsinghua University, Beijing, China (PhD 2001)
According to our database1,
Jianhua Tao
authored at least 456 papers
between 1998 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Multimodal Cross-Lingual Summarization for Videos: A Revisit in Knowledge Distillation Induced Triple-Stage Training Method.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024
DepressionMLP: A Multi-Layer Perceptron Architecture for Automatic Depression Level Prediction via Facial Keypoints and Action Units.
IEEE Trans. Circuits Syst. Video Technol., September, 2024
PIRNet: Personality-Enhanced Iterative Refinement Network for Emotion Recognition in Conversation.
IEEE Trans. Neural Networks Learn. Syst., February, 2024
Knowl. Based Syst., January, 2024
ICaps-ResLSTM: Improved capsule network and residual LSTM for EEG emotion recognition.
Biomed. Signal Process. Control., January, 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis.
IEEE Trans. Affect. Comput., 2024
IEEE Trans. Affect. Comput., 2024
Pattern Recognit., 2024
Bayesian hypernetwork collaborates with time-difference evolutional network for temporal knowledge prediction.
Neural Networks, 2024
DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection.
Neural Networks, 2024
Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.
Neural Networks, 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for self-supervised Audio-Visual Emotion Recognition.
Inf. Fusion, 2024
Inf. Fusion, 2024
DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection.
CoRR, 2024
CoRR, 2024
WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification.
CoRR, 2024
CoRR, 2024
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.
CoRR, 2024
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.
CoRR, 2024
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models.
CoRR, 2024
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
CoRR, 2024
EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech.
CoRR, 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024
CoRR, 2024
An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio.
CoRR, 2024
CoRR, 2024
ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.
CoRR, 2024
CoRR, 2024
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection.
CoRR, 2024
Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy.
CoRR, 2024
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio.
CoRR, 2024
KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering.
CoRR, 2024
Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild.
CoRR, 2024
Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning.
CoRR, 2024
CoRR, 2024
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective Computing.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
APC: Predict Global Representation From Local Observation In Multi-Agent Reinforcement Learning.
Proceedings of the International Joint Conference on Neural Networks, 2024
What Comes Next and Why? A Staged Encoder-Decoder Architecture for Script Event Prediction.
Proceedings of the International Joint Conference on Neural Networks, 2024
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Multi-stage Vs Single-Stage: A Local Information Focused Approach for Overlapping Event Extraction.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024
NLoPT: N-gram Enhanced Low-Rank Task Adaptive Pre-training for Efficient Language Model Adaption.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024
Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms.
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024
Proceedings of the ACM Turing Award Celebration Conference 2024, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Proc. IEEE, October, 2023
Neurocomputing, October, 2023
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023
Speech Commun., April, 2023
Neural Networks, March, 2023
Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction.
IEEE Trans. Affect. Comput., 2023
IEEE Trans. Affect. Comput., 2023
SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition.
IEEE Trans. Affect. Comput., 2023
Neurocomputing, 2023
RMNAS: A Multimodal Neural Architecture Search Framework For Robust Multimodal Sentiment Analysis.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023
CoRR, 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition.
CoRR, 2023
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023
Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
Proceedings of the 31st ACM International Conference on Multimedia, 2023
MRAC'23: 1st International Workshop on Multimodal and Responsible Affective Computing.
Proceedings of the 31st ACM International Conference on Multimedia, 2023
SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Learning Item Attributes and User Interests for Knowledge Graph Enhanced Recommendation.
Proceedings of the Neural Information Processing - 30th International Conference, 2023
Proceedings of the International Conference on Machine Learning, 2023
M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023
GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
2022
Selective Element and Two Orders Vectorization Networks for Automatic Depression Severity Diagnosis via Facial Changes.
IEEE Trans. Circuits Syst. Video Technol., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
Emotional Conversation Generation Orientated Syntactically Constrained Bidirectional-Asynchronous Framework.
IEEE Trans. Affect. Comput., 2022
Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.
IEEE Signal Process. Lett., 2022
Speech Commun., 2022
Knowl. Based Syst., 2022
Predicting the Epidemics Trend of COVID-19 Using Epidemiological-Based Generative Adversarial Networks.
IEEE J. Sel. Top. Signal Process., 2022
IEEE J. Sel. Top. Signal Process., 2022
CoRR, 2022
System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.
CoRR, 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis.
CoRR, 2022
CoRR, 2022
Reducing language context confusion for end-to-end code-switching automatic speech recognition.
CoRR, 2022
Cogn. Comput. Syst., 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Emotional Reaction Analysis based on Multi-Label Graph Convolutional Networks and Dynamic Facial Expression Recognition Transformer.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022
Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022
Proceedings of the Asia Conference on Algorithms, Computing and Machine Learning, 2022
2021
Virtual Real. Intell. Hardw., 2021
Virtual Real. Intell. Hardw., 2021
Virtual Real. Intell. Hardw., 2021
Learning long-term temporal contexts using skip RNN for continuous emotion recognition.
Virtual Real. Intell. Hardw., 2021
Design and Analysis of a Human-Machine Interaction System for Researching Human's Dynamic Emotion.
IEEE Trans. Syst. Man Cybern. Syst., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2021
IEEE Signal Process. Mag., 2021
IEEE Signal Process. Mag., 2021
Speech Commun., 2021
Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.
Neural Networks, 2021
Knowl. Based Syst., 2021
A time-frequency channel attention and vectorization network for automatic depression level prediction.
Neurocomputing, 2021
Neurocomputing, 2021
Neurocomputing, 2021
Int. J. Autom. Comput., 2021
CoRR, 2021
Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.
CoRR, 2021
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021
Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model.
Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021
Multimodal Sentiment Analysis based on Recurrent Neural Network and Multimodal Attention.
Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021
Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Facial Micro-Expression Recognition Based on Multi-Scale Temporal and Spatial Features.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2021
Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021
Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
J. Signal Process. Syst., 2020
ACM Trans. Inf. Syst., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
Pattern Recognit., 2020
Automatic Assessment of Depression From Speech via a Hierarchical Attention Transfer Network and Attention Autoencoders.
IEEE J. Sel. Top. Signal Process., 2020
Emotional editing constraint conversation content generation based on reinforcement learning.
Inf. Fusion, 2020
Int. J. Autom. Comput., 2020
Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.
CoRR, 2020
Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.
CoRR, 2020
Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism.
Proceedings of the MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the ICCPR 2020: 9th International Conference on Computing and Pattern Recognition, Xiamen, China, October 30, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020
ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Virtual Real. Intell. Hardw., 2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Int. J. Autom. Comput., 2019
CoRR, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019
Discriminative Video Representation with Temporal Order for Micro-expression Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Phoneme Dependent Speaker Embedding and Model Factorization for Multi-speaker Speech Synthesis and Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction, 2019
2018
Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin.
J. Signal Process. Syst., 2018
CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.
J. Signal Process. Syst., 2018
Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018
IEEE ACM Trans. Audio Speech Lang. Process., 2018
Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition.
CoRR, 2018
CoRR, 2018
ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018
Multimodal Continuous Emotion Recognition with Data Augmentation Using Recurrent Neural Networks.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018
A Novel Unified Framework for Speech Enhancement and Bandwidth Extension Based on Jointly Trained Neural Networks.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018
BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Sparsity-Constrained Weight Mapping for Head-Related Transfer Functions Individualization from Anthropometric Features.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Pen Tip Motion Prediction for Handwriting Drawing Order Recovery using Deep Neural Network.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Reducing Tongue Shape Dimensionality from Hundreds of Available Resources Using Autoencoder.
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Proceedings of the 24th International Conference on Pattern Recognition, 2018
Architecture and Parameter Analysis to Convolutional Neural Network for Hand Tracking.
Proceedings of the Cloud Computing and Security - 4th International Conference, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
2017
Quantitative intonation modeling of interrogative sentences for Mandarin speech synthesis.
Speech Commun., 2017
J. Ambient Intell. Humaniz. Comput., 2017
Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics, 2017
Research on modeling and machining algorithm of multi-shear and multi-punch CNC transverse shear line.
Proceedings of the 2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, 2017
Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017
Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017
2016
Speech Enhancement Based on Analysis-Synthesis Framework with Improved Parameter Domain Enhancement.
J. Signal Process. Syst., 2016
J. Signal Process. Syst., 2016
Investigating Effect of Rich Syntactic Features on Mandarin Prosodic Boundaries Prediction.
J. Signal Process. Syst., 2016
Multim. Tools Appl., 2016
CoRR, 2016
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016
Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016
Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Long short term memory recurrent neural network based encoding method for emotion recognition in video.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016
Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.
Speech Commun., 2015
Multim. Tools Appl., 2015
Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015
Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the IEEE International Conference on Communication, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Evaluation of linear regression for speaker adaptation in HMM-based articulatory movements estimation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
From simulated speech to natural speech, what are the robust features for emotion recognition?
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015
2014
J. Signal Process. Syst., 2014
J. Signal Process. Syst., 2014
IEEE J. Sel. Top. Signal Process., 2014
Phonological influences on the realization of final lowering evidence from dialogue Chinese Mandarin.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Evaluation of parameter generation using high order dynamic features and long span windows for HMM based speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Context features based pre-selection and weight prediction in concatenation speech synthesis system.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Investigating effect of rich syntactic features on Mandarin prosodic phrase boundaries prediction.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Improving generation performance of speech emotion recognition by denoising autoencoders.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
2013
A novel unit selection method for concatenation speech system using similarity measure.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013
Stress predicition for Mandarin text-to-speech system using discourse context feature.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
Proceedings of the Chinese Lexical Semantics - 14th Workshop, 2013
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013
Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, 2013
Bayesian Inference Based Temporal Modeling for Naturalistic Affective Expression Classification.
Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
2012
J. Multimodal User Interfaces, 2012
EURASIP J. Adv. Signal Process., 2012
Statistical modification based post-filtering technique for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Multimodal emotion estimation and emotional synthesize for interaction virtual agent.
Proceedings of the 2nd IEEE International Conference on Cloud Computing and Intelligence Systems, 2012
2011
EURASIP J. Adv. Signal Process., 2011
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 17th International Congress of Phonetic Sciences, 2011
Global variance modeling on frequency domain delta LSP for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the Affective Computing and Intelligent Interaction, 2011
2010
IEEE Trans. Speech Audio Process., 2010
Proceedings of the 4th International Universal Communication Symposium, 2010
Proceedings of the 4th International Universal Communication Symposium, 2010
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the 12th International Conference on Multimodal Interfaces / 7. International Workshop on Machine Learning for Multimodal Interaction, 2010
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010
Proceedings of the 7th Symposium on Applied Perception in Graphics and Visualization, 2010
2009
IEEE Trans. Speech Audio Process., 2009
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009
Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology, 2009
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009
Categorizing terms' subjectivity and polarity manually for opinion mining in Chinese.
Proceedings of the Affective Computing and Intelligent Interaction, 2009
Proceedings of the Affective Computing and Intelligent Interaction, 2009
Proceedings of the Affective Information Processing, 2009
Proceedings of the Affective Information Processing, 2009
2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008
A Novel Classifier Based on Enhanced Lipschitz Embedding for Speech Emotion Recognition.
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the Blizzard Challenge 2008, 2008
2007
Int. J. Comput. Linguistics Chin. Lang. Process., 2007
Development of an integrated model for assessing the impact of diffuse and point source pollution on coastal waters.
Environ. Model. Softw., 2007
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007
Proceedings of the International Conference on Image Processing, 2007
Speech Emotion Recognition Based on a Fusion of All-Class and Pairwise-Class Feature Selection.
Proceedings of the Computational Science, 2007
Proceedings of the IEEE International Conference on Acoustics, 2007
Proceedings of the Affective Computing and Intelligent Interaction, 2007
Proceedings of the Affective Computing and Intelligent Interaction, 2007
Proceedings of the Affective Computing and Intelligent Interaction, 2007
2006
IEEE Trans. Speech Audio Process., 2006
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006
A New Pitch Generation Model Based on Internal Dependence of Pitch Contour for Manadrin TTS System.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
Applying Pitch Target Model to Convert F0 Contour for Expressive Mandarin Speech Synthesis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005
Proceedings of the Affective Computing and Intelligent Interaction, 2005
Proceedings of the Affective Computing and Intelligent Interaction, 2005
Proceedings of the Affective Computing and Intelligent Interaction, 2005
Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images.
Proceedings of the Affective Computing and Intelligent Interaction, 2005
Proceedings of the Affective Computing and Intelligent Interaction, 2005
2004
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004
A new multicomponent AM-FM demodulation with predicting frequency boundaries and its application to formant estimation.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004
2003
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002
Proceedings of the First Workshop on Chinese Language Processing, 2002
2000
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000
1998
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998