Jianhua Tao

Orcid: 0000-0002-9344-6428

Affiliations:
  • Tsinghua University, Department of Automation, Beijing, China
  • University of Chinese Academy of Sciences, School of Artificial Intelligence, Beijing, China
  • Tsinghua University, Beijing, China (PhD 2001)


According to our database1, Jianhua Tao authored at least 456 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Multimodal Cross-Lingual Summarization for Videos: A Revisit in Knowledge Distillation Induced Triple-Stage Training Method.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

DepressionMLP: A Multi-Layer Perceptron Architecture for Automatic Depression Level Prediction via Facial Keypoints and Action Units.
IEEE Trans. Circuits Syst. Video Technol., September, 2024

PIRNet: Personality-Enhanced Iterative Refinement Network for Emotion Recognition in Conversation.
IEEE Trans. Neural Networks Learn. Syst., February, 2024

Multi-level graph contrastive learning.
Neurocomputing, February, 2024

VLP2MSA: Expanding vision-language pre-training to multimodal sentiment analysis.
Knowl. Based Syst., January, 2024

ICaps-ResLSTM: Improved capsule network and residual LSTM for EEG emotion recognition.
Biomed. Signal Process. Control., January, 2024

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Efficient Multimodal Transformer With Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis.
IEEE Trans. Affect. Comput., 2024

WavDepressionNet: Automatic Depression Level Prediction via Raw Speech Signals.
IEEE Trans. Affect. Comput., 2024

CFAD: A Chinese dataset for fake audio detection.
Speech Commun., 2024

Zero-shot voice conversion based on feature disentanglement.
Speech Commun., 2024

SceneFake: An initial dataset and benchmarks for scene fake audio detection.
Pattern Recognit., 2024

Bayesian hypernetwork collaborates with time-difference evolutional network for temporal knowledge prediction.
Neural Networks, 2024

DGSD: Dynamical graph self-distillation for EEG-based auditory spatial attention detection.
Neural Networks, 2024

Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection.
Neural Networks, 2024

M2ixKG: Mixing for harder negative samples in knowledge graph.
Neural Networks, 2024

HiCMAE: Hierarchical Contrastive Masked Autoencoder for self-supervised Audio-Visual Emotion Recognition.
Inf. Fusion, 2024

GPT-4V with emotion: A zero-shot benchmark for Generalized Emotion Recognition.
Inf. Fusion, 2024

DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection.
CoRR, 2024

Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark.
CoRR, 2024

WMCodec: End-to-End Neural Speech Codec with Deep Watermarking for Authenticity Verification.
CoRR, 2024

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0.
CoRR, 2024

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech.
CoRR, 2024

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation.
CoRR, 2024

Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models.
CoRR, 2024

Exploring the Role of Audio in Multimodal Misinformation Detection.
CoRR, 2024

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
CoRR, 2024

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech.
CoRR, 2024

A Noval Feature via Color Quantisation for Fake Audio Detection.
CoRR, 2024

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing.
CoRR, 2024

ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild.
CoRR, 2024

MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics.
CoRR, 2024

An Unsupervised Domain Adaptation Method for Locating Manipulated Region in partially fake Audio.
CoRR, 2024

AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition.
CoRR, 2024

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation.
CoRR, 2024

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models.
CoRR, 2024

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation.
CoRR, 2024

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio.
CoRR, 2024

RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection.
CoRR, 2024

TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking.
CoRR, 2024

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation.
CoRR, 2024

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection.
CoRR, 2024

Generalized Source Tracing: Detecting Novel Audio Deepfake Algorithm with Real Emphasis and Fake Dispersion Strategy.
CoRR, 2024

Generalized Fake Audio Detection via Deep Stable Learning.
CoRR, 2024

EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark.
CoRR, 2024

Can large language models understand uncommon meanings of common words?
CoRR, 2024

The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio.
CoRR, 2024

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering.
CoRR, 2024

Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild.
CoRR, 2024

Can Deception Detection Go Deeper? Dataset, Evaluation, and Benchmark for Deception Reasoning.
CoRR, 2024

MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition.
CoRR, 2024

SVFAP: Self-supervised Video Facial Affect Perceiver.
CoRR, 2024

Emotion selectable end-to-end text-based speech editing.
Artif. Intell., 2024

Bilateral Masking with prompt for Knowledge Graph Completion.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Social Perception Prediction for MuSe 2024: Joint Learning of Multiple Perceptions.
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024

DPP: A Dual-Phase Processing Method for Cross-Cultural Humor Detection.
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024

MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

MRAC'24 Track 2: 2nd International Workshop on Multimodal and Responsible Affective Computing.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

Label-Efficient Emotion and Sentiment Analysis.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Utilizing Speaker Profiles for Impersonation Audio Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

MSFNet: Multi-Scale Fusion Network for Brain-Controlled Speaker Extraction.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

APC: Predict Global Representation From Local Observation In Multi-Agent Reinforcement Learning.
Proceedings of the International Joint Conference on Neural Networks, 2024

What Comes Next and Why? A Staged Encoder-Decoder Architecture for Script Event Prediction.
Proceedings of the International Joint Conference on Neural Networks, 2024

Dual-View Multimodal Interaction in Multimodal Sentiment Analysis.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Pseudo Labels Regularization for Imbalanced Partial-Label Learning.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-Scale Permutation Entropy for Audio Deepfake Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

Fewer-Token Neural Speech Codec with Time-Invariant Codes.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-stage Vs Single-Stage: A Local Information Focused Approach for Overlapping Event Extraction.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2024, 2024

NLoPT: N-gram Enhanced Low-Rank Task Adaptive Pre-training for Efficient Language Model Adaption.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024

Distinguishing Neural Speech Synthesis Models Through Fingerprints in Speech Waveforms.
Proceedings of the Chinese Computational Linguistics - 23rd China National Conference, 2024

Open-world Domain Adaptation and Generalization.
Proceedings of the ACM Turing Award Celebration Conference 2024, 2024

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era.
Proc. IEEE, October, 2023

Spatial-temporal knowledge graph network for event prediction.
Neurocomputing, October, 2023

GCNet: Graph Completion Network for Incomplete Multimodal Learning in Conversation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Transfer knowledge for punctuation prediction via adversarial training.
Speech Commun., April, 2023

Adaptive pseudo-Siamese policy network for temporal knowledge prediction.
Neural Networks, March, 2023

Adversarial Multi-Task Learning for Mandarin Prosodic Boundary Prediction With Multi-Modal Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Dual Attention and Element Recalibration Networks for Automatic Depression Level Prediction.
IEEE Trans. Affect. Comput., 2023

Multimodal Spatiotemporal Representation for Automatic Depression Level Detection.
IEEE Trans. Affect. Comput., 2023

SMIN: Semi-Supervised Multi-Modal Interaction Network for Conversational Emotion Recognition.
IEEE Trans. Affect. Comput., 2023

Hierarchical graph attention network for temporal knowledge graph reasoning.
Neurocomputing, 2023

RMNAS: A Multimodal Neural Architecture Search Framework For Robust Multimodal Sentiment Analysis.
CoRR, 2023

What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection.
CoRR, 2023

GPT-4V with Emotion: A Zero-shot Benchmark for Multimodal Emotion Understanding.
CoRR, 2023

Learning to Behave Like Clean Speech: Dual-Branch Knowledge Distillation for Noise-Robust Fake Audio Detection.
CoRR, 2023

Fewer-token Neural Speech Codec with Time-invariant Codes.
CoRR, 2023

Controllable Residual Speaker Representation for Voice Conversion.
CoRR, 2023

Audio Deepfake Detection: A Survey.
CoRR, 2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
CoRR, 2023

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition.
CoRR, 2023

Explainable Multimodal Emotion Reasoning.
CoRR, 2023

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion.
CoRR, 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
CoRR, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
CoRR, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
CoRR, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.
CoRR, 2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.
CoRR, 2023

DALI: Dynamically Adjusted Label Importance for Noisy Partial Label Learning.
CoRR, 2023

UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion.
CoRR, 2023

VRA: Variational Rectified Activation for Out-of-distribution Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ALIM: Adjusting Label Importance Mechanism for Noisy Partial Label Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Exclusive Modeling for MuSe-Personalisation Challenge.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Integrating VideoMAE based model and Optical Flow for Micro- and Macro-expression Spotting.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

MRAC'23: 1st International Workshop on Multimodal and Responsible Affective Computing.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

SOT: Self-supervised Learning-Assisted Optimal Transport for Unsupervised Adaptive Speech Emotion Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning Item Attributes and User Interests for Knowledge Graph Enhanced Recommendation.
Proceedings of the Neural Information Processing - 30th International Conference, 2023

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection.
Proceedings of the International Conference on Machine Learning, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2023

GCC-Speaker: Target Speaker Localization with Optimal Speaker-Dependent Weighting in Multi-Speaker Scenarios.
Proceedings of the IEEE International Conference on Acoustics, 2023

Adaptive Fake Audio Detection with Low-Rank Model Squeezing.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

ADD 2023: the Second Audio Deepfake Detection Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

TST: Time-Sparse Transducer for Automatic Speech Recognition.
Proceedings of the Artificial Intelligence - Third CAAI International Conference, 2023

The VIBVG Speech Synthesis System for Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

Hybrid Multi-Task Learning for End-To-End Multimodal Emotion Recognition.
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

2022
Selective Element and Two Orders Vectorization Networks for Automatic Depression Severity Diagnosis via Facial Changes.
IEEE Trans. Circuits Syst. Video Technol., 2022

CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation.
IEEE ACM Trans. Audio Speech Lang. Process., 2022

Emotional Conversation Generation Orientated Syntactically Constrained Bidirectional-Asynchronous Framework.
IEEE Trans. Affect. Comput., 2022

Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition.
IEEE Signal Process. Lett., 2022

One-shot emotional voice conversion based on feature separation.
Speech Commun., 2022

Tucker decomposition-based temporal knowledge graph completion.
Knowl. Based Syst., 2022

Predicting the Epidemics Trend of COVID-19 Using Epidemiological-Based Generative Adversarial Networks.
IEEE J. Sel. Top. Signal Process., 2022

Editorial: Intelligent Signal Analysis for Contagious Virus Diseases.
IEEE J. Sel. Top. Signal Process., 2022

SceneFake: An Initial Dataset and Benchmarks for Scene Fake Audio Detection.
CoRR, 2022

EmoFake: An Initial Dataset for Emotion Fake Audio Detection.
CoRR, 2022

ARNet: Automatic Refinement Network for Noisy Partial Label Learning.
CoRR, 2022

System Fingerprints Detection for DeepFake Audio: An Initial Dataset and Investigation.
CoRR, 2022

Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis.
CoRR, 2022

Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge.
CoRR, 2022

EmotionNAS: Two-stream Architecture Search for Speech Emotion Recognition.
CoRR, 2022

MixKG: Mixing for harder negative samples in knowledge graph.
CoRR, 2022

ADD 2022: the First Audio Deep Synthesis Detection Challenge.
CoRR, 2022

Reducing language context confusion for end-to-end code-switching automatic speech recognition.
CoRR, 2022

AHRNN: Attention-Based Hybrid Robust Neural Network for emotion recognition.
Cogn. Comput. Syst., 2022

An Initial Investigation for Detecting Vocoder Fingerprints of Fake Audio.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Fully Automated End-to-End Fake Audio Detection.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

Emotional Reaction Analysis based on Multi-Label Graph Convolutional Networks and Dynamic Facial Expression Recognition Transformer.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis.
Proceedings of the DDAM@MM 2022: Proceedings of the 1st International Workshop on Deepfake Detection for Audio Multimedia, 2022

DDAM '22: 1st International Workshop on Deepfake Detection for Audio Multimedia.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Multimodal Temporal Attention in Sentiment Analysis.
Proceedings of the MuSe@MM 2022: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge, 2022

Masking-based Neural Beamformer for Multichannel Speech Enhancement.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Prediction of Depression Severity Based on Transformer Encoder and CNN Model.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Speaker recognition-assisted robust audio deepfake detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

reducing multilingual context confusion for end-to-end code-switching automatic speech recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ADD 2022: the first Audio Deep Synthesis Detection Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing.
Proceedings of the IEEE International Conference on Acoustics, 2022

Automatic Depression Level Assessment from Speech By Long-Term Global Information Embedding.
Proceedings of the IEEE International Conference on Acoustics, 2022

End-to-End Network Based on Transformer for Automatic Detection of Covid-19.
Proceedings of the IEEE International Conference on Acoustics, 2022

Two-Aspect Information Interaction Model for ABAW4 Multi-task Challenge.
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Joint Event Extraction Based on CNN-BiGRU and Attention Mechanism.
Proceedings of the Asia Conference on Algorithms, Computing and Machine Learning, 2022

2021
Self-attention transfer networks for speech emotion recognition.
Virtual Real. Intell. Hardw., 2021

Emotion recognition for human-computer interaction.
Virtual Real. Intell. Hardw., 2021

Review of micro-expression spotting and recognition in video sequences.
Virtual Real. Intell. Hardw., 2021

Learning long-term temporal contexts using skip RNN for continuous emotion recognition.
Virtual Real. Intell. Hardw., 2021

Design and Analysis of a Human-Machine Interaction System for Researching Human's Dynamic Emotion.
IEEE Trans. Syst. Man Cybern. Syst., 2021

CTNet: Conversational Transformer Network for Emotion Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Intelligent Signal Processing for Affective Computing [From the Guest Editors].
IEEE Signal Process. Mag., 2021

Deep Learning for Mobile Mental Health: Challenges and recent advances.
IEEE Signal Process. Mag., 2021

Exploiting the directional coherence function for multichannel source extraction.
Speech Commun., 2021

Combining a parallel 2D CNN with a self-attention Dilated Residual Network for CTC-based discrete speech emotion recognition.
Neural Networks, 2021

Multi-aspect self-supervised learning for heterogeneous information network.
Knowl. Based Syst., 2021

A time-frequency channel attention and vectorization network for automatic depression level prediction.
Neurocomputing, 2021

DECN: Dialogical emotion correction network for conversational emotion recognition.
Neurocomputing, 2021

Self-supervised graph representation learning via bootstrapping.
Neurocomputing, 2021

Correction to: Semi-supervised Ladder Networks for Speech Emotion Recognition.
Int. J. Autom. Comput., 2021

Knowledge graph enhanced recommender system.
CoRR, 2021

Multi-Level Graph Contrastive Learning.
CoRR, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.
CoRR, 2021

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition.
CoRR, 2021

Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT.
CoRR, 2021

Which Phonemes Will Distinguish the Different Regions Within the Same Dialect?
Proceedings of the 24th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques, 2021

Multimodal Emotion Recognition and Sentiment Analysis via Attention Enhanced Recurrent Model.
Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021

Multimodal Sentiment Analysis based on Recurrent Neural Network and Multimodal Attention.
Proceedings of the MuSe '21: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge, 2021

Rnn-transducer With Language Bias For End-to-end Mandarin-English Code-switching Speech Recognition.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Hierarchically Attending Time-Frequency and Channel Features for Improving Speaker Verification.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Towards Fine-Grained Prosody Control for Voice Conversion.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning.
Proceedings of the 12th International Symposium on Chinese Spoken Language Processing, 2021

Half-Truth: A Partially Fake Audio Detection Dataset.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continual Learning for Fake Audio Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TDCA-Net: Time-Domain Channel Attention Network for Depression Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Spelling Correction Conditioned on Acoustic Feature for Code-Switching Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Facial Micro-Expression Recognition Based on Multi-Scale Temporal and Spatial Features.
Proceedings of the ICMI '21 Companion: Companion Publication of the 2021 International Conference on Multimodal Interaction, Montreal, QC, Canada, October 18, 2021

ASMMC21: The 6th International Workshop on Affective Social Multimedia Computing.
Proceedings of the ICMI '21: International Conference on Multimodal Interaction, 2021

Patnet : A Phoneme-Level Autoregressive Transformer Network for Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multimodal Cross- and Self-Attention Network for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multi-Scale and Multi-Region Facial Discriminative Representation for Automatic Depression Level Prediction.
Proceedings of the IEEE International Conference on Acoustics, 2021

Bi-Level Style and Prosody Decoupling Modeling for Personalized End-to-End Speech Synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2021

Decoupling Pronunciation and Language for End-to-End Code-Switching Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

One In A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020
A Public Chinese Dataset for Language Model Adaptation.
J. Signal Process. Syst., 2020

Emotional Conversation Generation Based on a Bayesian Deep Neural Network.
ACM Trans. Inf. Syst., 2020

End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Deep imitator: Handwriting calligraphy imitation via deep attention networks.
Pattern Recognit., 2020

Automatic Assessment of Depression From Speech via a Hierarchical Attention Transfer Network and Attention Autoencoders.
IEEE J. Sel. Top. Signal Process., 2020

Emotional editing constraint conversation content generation based on reinforcement learning.
Inf. Fusion, 2020

Expression Analysis Based on Face Regions in Real-world Conditions.
Int. J. Autom. Comput., 2020

Self-supervised Graph Representation Learning via Bootstrapping.
CoRR, 2020

Simultaneous Denoising and Dereverberation Using Deep Embedding Features.
CoRR, 2020

Adversarial Transfer Learning for Punctuation Restoration.
CoRR, 2020

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method.
CoRR, 2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features.
CoRR, 2020

Multi-modal Continuous Dimensional Emotion Recognition Using Recurrent Neural Network and Self-Attention Mechanism.
Proceedings of the MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop, 2020

Hybrid Network Feature Extraction for Depression Assessment from Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Focal Loss for Punctuation Prediction.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Bi-Level Speaker Supervision for One-Shot Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Non-Autoregressive End-to-End TTS with Coarse-to-Fine Decoding.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Conversational Emotion Recognition Using Self-Attention Mechanisms and Graph Neural Networks.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Context-Dependent Domain Adversarial Neural Network for Multimodal Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Comparison of Glottal Source Parameter Values in Emotional Vowels.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Utterance-Level Representations with Label Smoothing for Speech Emotion Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Gated Recurrent Fusion of Spatial and Spectral Features for Multi-Channel Speech Separation with Deep Embedding Representations.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

AMINN: Attention-Based Multi-Information Neural Network for Emotion Recognition.
Proceedings of the ICCPR 2020: 9th International Conference on Computing and Pattern Recognition, Xiamen, China, October 30, 2020

Synchronous Transformers for end-to-end Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Multimodal Transformer Fusion for Continuous Emotion Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Focusing on Attention: Prosody Transfer and Adaptative Optimization Strategy for Multi-Speaker End-to-End Speech Synthesis.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

CASIA Voice Conversion System for the Voice Conversion Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

The NLPR Speech Synthesis entry for Blizzard Challenge 2020.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Micro-Expression Recognition Based on Multiple Aggregation Networks.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2020

ParamE: Regarding Neural Network Parameters as Relation Embeddings for Knowledge Graph Completion.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Data fusion methods in multimodal human computer dialog.
Virtual Real. Intell. Hardw., 2019

Forward-Backward Decoding Sequence for Regularizing End-to-End TTS.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Language-Adversarial Transfer Learning for Low-Resource Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Semi-supervised Ladder Networks for Speech Emotion Recognition.
Int. J. Autom. Comput., 2019

Integrating Whole Context to Sequence-to-sequence Speech Recognition.
CoRR, 2019

Expression Analysis Based on Face Regions in Read-world Conditions.
CoRR, 2019

Domain adversarial learning for emotion recognition.
CoRR, 2019

Towards Fine-Grained Prosody Control for Voice Conversion.
CoRR, 2019

Speech Emotion Recognition via Contrastive Loss under Siamese Networks.
CoRR, 2019

Reinforcement Learning Based Emotional Editing Constraint Conversation Generation.
CoRR, 2019

Forward-Backward Decoding for Regularizing End-to-End TTS.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-Attention Transducers for End-to-End Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Automatic Depression Level Detection via ℓ<sub>p</sub>-Norm Pooling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Conversational Emotion Analysis via Attention Mechanisms.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Drawing Order Recovery for Handwriting Chinese Characters.
Proceedings of the IEEE International Conference on Acoustics, 2019

Language-invariant Bottleneck Features from Adversarial End-to-end Acoustic Models for Low Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Discriminative Video Representation with Temporal Order for Micro-expression Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Phoneme Dependent Speaker Embedding and Model Factorization for Multi-speaker Speech Synthesis and Adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2019

The NLPR Speech Synthesis entry for Blizzard Challenge 2019.
Proceedings of the Blizzard Challenge 2019, Vienna, Austria, September 23, 2019, 2019

Focal Loss for End-to-end Short Utterances Chinese Dialect Identification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Batch Normalization based Unsupervised Speaker Adaptation for Acoustic Models.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Distilling Knowledge for Distant Speech Recognition via Parallel Data.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Hypersphere Embedding and Additive Margin for Query-by-example Keyword Spotting.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Noise Prior Knowledge Learning for Speech Enhancement via Gated Convolutional Generative Adversarial Network.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Voice Activity Detection Based on Time-Delay Neural Networks.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Local Second-Order Gradient Cross Pattern for Automatic Depression Detection.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, 2019

Efficient Modeling of Long Temporal Contexts for Continuous Emotion Recognition.
Proceedings of the 8th International Conference on Affective Computing and Intelligent Interaction, 2019

2018
Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin.
J. Signal Process. Syst., 2018

CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition.
J. Signal Process. Syst., 2018

Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning.
J. Signal Process. Syst., 2018

Deep Learning Based Speech Separation via NMF-Style Reconstructions.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

Investigation of Multimodal Features, Classifiers and Fusion Methods for Emotion Recognition.
CoRR, 2018

Distilling Knowledge Using Parallel Data for Far-field Speech Recognition.
CoRR, 2018

ASMMC-MMAC 2018: The Joint Workshop of 4th the Workshop on Affective Social Multimedia Computing and first Multi-Modal Affective Computing of Large-Scale Multimedia Data Workshop.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Deep Learning for Continuous Multiple Time Series Annotations.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018

Multimodal Continuous Emotion Recognition with Data Augmentation Using Recurrent Neural Networks.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018

A Novel Unified Framework for Speech Enhancement and Bandwidth Extension Based on Jointly Trained Neural Networks.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Utterance-level Permutation Invariant Training with Discriminative Learning for Single Channel Speech Separation.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

CLMAD: A Chinese Language Model Adaptation Dataset.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

On the Application and Compression of Deep Time Delay Neural Network for Embedded Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Sparsity-Constrained Weight Mapping for Head-Related Transfer Functions Individualization from Anthropometric Features.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Speech Emotion Recognition from Variable-Length Inputs with Triplet Loss Function.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Transfer Learning Based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Pen Tip Motion Prediction for Handwriting Drawing Order Recovery using Deep Neural Network.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Reducing Tongue Shape Dimensionality from Hundreds of Available Resources Using Autoencoder.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Self-Talk: Responses to Users' Opinions and Challenges in Human Computer Dialog.
Proceedings of the 24th International Conference on Pattern Recognition, 2018

Architecture and Parameter Analysis to Convolutional Neural Network for Hand Tracking.
Proceedings of the Cloud Computing and Security - 4th International Conference, 2018

Adversarial Multilingual Training for Low-Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-End Continuous Emotion Recognition from Video Using 3D Convlstm Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Quantitative intonation modeling of interrogative sentences for Mandarin speech synthesis.
Speech Commun., 2017

CHEAVD: a Chinese natural emotional audio-visual database.
J. Ambient Intell. Humaniz. Comput., 2017

Nonrigid point matching of Chinese characters for robot writing.
Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics, 2017

Research on modeling and machining algorithm of multi-shear and multi-punch CNC transverse shear line.
Proceedings of the 2017 IEEE International Conference on Cybernetics and Intelligent Systems (CIS) and IEEE Conference on Robotics, 2017

Continuous Multimodal Emotion Prediction Based on Long Short Term Memory Recurrent Neural Network.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Investigating Efficient Feature Representation Methods and Training Objective for BLSTM-Based Phone Duration Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Distilling Knowledge from an Ensemble of Models for Punctuation Prediction.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Domain Knowledge-Assisted Nonlinear Model for Head-Related Transfer Functions Based on Bottleneck Deep Neural Network.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

The NLPR Speech Synthesis entry for Blizzard Challenge 2017.
Proceedings of the Blizzard Challenge 2017, Stockholm, Sweden, August 25, 2017, 2017

2016
Speech Enhancement Based on Analysis-Synthesis Framework with Improved Parameter Domain Enhancement.
J. Signal Process. Syst., 2016

Guest Editorial: Advances in Machine Learning for Speech Processing.
J. Signal Process. Syst., 2016

Investigating Effect of Rich Syntactic Features on Mandarin Prosodic Boundaries Prediction.
J. Signal Process. Syst., 2016

Emotional head motion predicting from prosodic and linguistic features.
Multim. Tools Appl., 2016

Audio Visual Emotion Recognition with Temporal Alignment and Perception Attention.
CoRR, 2016

Football News Generation from Chinese Live Webcast Script.
Proceedings of the Natural Language Understanding and Intelligent Applications, 2016

Text-based sentential stress prediction using continuous lexical embedding for Mandarin speech synthesis.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving accented Mandarin speech recognition by using recurrent neural network based language model adaptation.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

End-to-end keywords spotting based on connectionist temporal classification for Mandarin.
Proceedings of the 10th International Symposium on Chinese Spoken Language Processing, 2016

Improving Prosodic Boundaries Prediction for Mandarin Speech Synthesis by Using Enhanced Embedding Feature and Model Fusion Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network Based Speech Synthesis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Sparse Spherical Harmonic-Based Model in Subbands for Head-Related Transfer Functions.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

A Novel Research to Artificial Bandwidth Extension Based on Deep BLSTM Recurrent Neural Networks and Exemplar-Based Sparse Representation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Extraction of tongue contour in real-time magnetic resonance imaging sequences.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Long short term memory recurrent neural network based encoding method for emotion recognition in video.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Recurrent Neural Network Based Language Model Adaptation for Accent Mandarin Speech.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

MEC 2016: The Multimodal Emotion Recognition Challenge of CCPR 2016.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

BLSTM Guided Unit Selection Synthesis System for Blizzard Challenge 2016.
Proceedings of the Blizzard Challenge 2016, Cuppertino, CA, USA, September 16, 2016, 2016

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

Deep neural network based voice conversion with a large synthesized parallel corpus.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015
Micro-Expression Recognition Using Color Spaces.
IEEE Trans. Image Process., 2015

Hierarchical stress modeling and generation in mandarin for expressive Text-to-Speech.
Speech Commun., 2015

User behavior fusion in dialog management with multi-modal history cues.
Multim. Tools Appl., 2015

Long Short Term Memory Recurrent Neural Network based Multimodal Dimensional Emotion Recognition.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

Combining extreme learning machine and decision tree for duration prediction in HMM based speech synthesis.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A novel method of artificial bandwidth extension using deep architecture.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Exploring smart pilot for partial packet recovery in super dense wireless networks.
Proceedings of the IEEE International Conference on Communication, 2015

Estimate articulatory MRI series from acoustic signal using deep architecture.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Evaluation of linear regression for speaker adaptation in HMM-based articulatory movements estimation.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Voice quality: Not only about "you" but also about "your interlocutor".
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

From simulated speech to natural speech, what are the robust features for emotion recognition?
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

Multi task sequence learning for depression scale prediction from video.
Proceedings of the 2015 International Conference on Affective Computing and Intelligent Interaction, 2015

2014
Pitch-Scaled Spectrum Based Excitation Model for HMM-based Speech Synthesis.
J. Signal Process. Syst., 2014

Guest Editorial: Machine Learning for Signal Processing.
J. Signal Process. Syst., 2014

Introduction to the Issue on Statistical Parametric Speech Synthesis.
IEEE J. Sel. Top. Signal Process., 2014

Phonological influences on the realization of final lowering evidence from dialogue Chinese Mandarin.
Proceedings of the 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA), 2014

Multi-scale Temporal Modeling for Dimensional Emotion Recognition in Video.
Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, 2014

The expression of emotions by text and speech.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Survey on discriminative feature selection for speech emotion recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Evaluation of parameter generation using high order dynamic features and long span windows for HMM based speech synthesis.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Context features based pre-selection and weight prediction in concatenation speech synthesis system.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Investigating effect of rich syntactic features on Mandarin prosodic phrase boundaries prediction.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Improving generation performance of speech emotion recognition by denoising autoencoders.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Combining prosodic and spectral features for Mandarin intonation recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improving Mandarin prosodic boundary prediction with rich syntactic features.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A novel hybrid mandarin speech synthesis system using different base units for model training and concatenation.
Proceedings of the IEEE International Conference on Acoustics, 2014

Tongue shape conversion with non-parallel training data.
Proceedings of the IEEE International Conference on Acoustics, 2014

2013
A novel unit selection method for concatenation speech system using similarity measure.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Stress predicition for Mandarin text-to-speech system using discourse context feature.
Proceedings of the 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE), 2013

Extraction of tongue contour in X-ray videos.
Proceedings of the IEEE International Conference on Acoustics, 2013

Speaker-independent lips and tongue visualization of vowels.
Proceedings of the IEEE International Conference on Acoustics, 2013

On Constructing a Chinese Task-Oriental Subjectivity Lexicon.
Proceedings of the Chinese Lexical Semantics - 14th Workshop, 2013

Combining emotional history through multimodal fusion methods.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2013

Extended Decision Tree with or Relationship for HMM-Based Speech Synthesis.
Proceedings of the 2nd IAPR Asian Conference on Pattern Recognition, 2013

Bayesian Inference Based Temporal Modeling for Naturalistic Affective Expression Classification.
Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

2012
A multimodal approach of generating 3D human-like talking agent.
J. Multimodal User Interfaces, 2012

Emotion and mental state recognition from speech.
EURASIP J. Adv. Signal Process., 2012

Statistical modification based post-filtering technique for HMM-based speech synthesis.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Pitch-Scaled Analysis based Residual Reconstruction for Speech Analysis and Synthesis.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Multimodal emotion estimation and emotional synthesize for interaction virtual agent.
Proceedings of the 2nd IEEE International Conference on Cloud Computing and Intelligence Systems, 2012

2011
Utterance independent bimodal emotion recognition in spontaneous communication.
EURASIP J. Adv. Signal Process., 2011

An outlier rejection scheme for optical flow tracking.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

An excitation model based on inverse filtering for speech analysis and synthesis.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Preface.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Animating a Chinese interactive virtual character.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

HMM-based Tianjin Dialect speech synthesis using bilateral question Set.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Inverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Hierarchical Stress Modeling in Mandarin Text-to-Speech.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

The Stability Analysis of Disyllabic Stress in Mandarin Speech.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Global variance modeling on frequency domain delta LSP for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

The CASIA Audio Emotion Recognition Method for Audio/Visual Emotion Challenge 2011.
Proceedings of the Affective Computing and Intelligent Interaction, 2011

2010
Supervisory Data Alignment for Text-Independent Voice Conversion.
IEEE Trans. Speech Audio Process., 2010

HMM based speech synthesis with Global Variance Training method.
Proceedings of the 4th International Universal Communication Symposium, 2010

Real-time speech-driven lip synchronization.
Proceedings of the 4th International Universal Communication Symposium, 2010

The duration analysis of the checked tones in Cantonese speech.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A novel hybrid approach for Mandarin speech synthesis.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Text-based unstressed syllable prediction in Mandarin.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Mood avatar: automatic text-driven head motion synthesis.
Proceedings of the 12th International Conference on Multimodal Interfaces / 7. International Workshop on Machine Learning for Multimodal Interaction, 2010

The WISTON Text to Speech System for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

Does culture affect the perception of emotion in virtual faces?
Proceedings of the 7th Symposium on Applied Perception in Graphics and Visualization, 2010

2009
Realistic Visual Speech Synthesis Based on Hybrid Concatenation Method.
IEEE Trans. Speech Audio Process., 2009

Dimension reducing of LSF parameters based on radial basis function neural network.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Prosody modeling for mandarin exclamatory speech.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

Prediction of Ground Water Level Based on DE-BP Neutral Network.
Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology, 2009

The WISTON Text-to-Speech System for Blizzard Challenge 2009.
Proceedings of the Blizzard Challenge 2009, Edinburgh, Scotland, UK, September 4, 2009, 2009

Categorizing terms' subjectivity and polarity manually for opinion mining in Chinese.
Proceedings of the Affective Computing and Intelligent Interaction, 2009

A multiple perception model on emotional speech.
Proceedings of the Affective Computing and Intelligent Interaction, 2009

Face Animation Based on Large Audiovisual Database.
Proceedings of the Affective Information Processing, 2009

Emotional Speech Generation by Using Statistic Prosody Conversion Methods.
Proceedings of the Affective Information Processing, 2009

2008
Improving HMM Based Speech Synthesis by Reducing Over-Smoothing Problems.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

Prosody Modification on Mixed-Language Speech Synthesis.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A Maximum Entropy Based Hierarchical Model for Automatic Prosodic Boundary Labeling in Mandarin.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

A Novel Classifier Based on Enhanced Lipschitz Embedding for Speech Emotion Recognition.
Proceedings of the Advanced Intelligent Computing Theories and Applications. With Aspects of Theoretical and Methodological Issues, 2008

Text-independent voice conversion based on state mapped codebook.
Proceedings of the IEEE International Conference on Acoustics, 2008

Tree-guided transformation-based homograph disambiguation in Mandarin TTS system.
Proceedings of the IEEE International Conference on Acoustics, 2008

The WISTON Text to Speech System for Blizzard 2008.
Proceedings of the Blizzard Challenge 2008, 2008

2007
Manifolds Based Emotion Recognition in Speech.
Int. J. Comput. Linguistics Chin. Lang. Process., 2007

Development of an integrated model for assessing the impact of diffuse and point source pollution on coastal waters.
Environ. Model. Softw., 2007

Modeling incompletion phenomenon in Mandarin dialog prosody.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Speech Emotion Recognition using an Enhanced Co-Training Algorithm.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Dynamic Audio-Visual Mapping using Fused Hidden Markov Model Inversion Method.
Proceedings of the International Conference on Image Processing, 2007

Speech Emotion Recognition Based on a Fusion of All-Class and Pairwise-Class Feature Selection.
Proceedings of the Computational Science, 2007

A Novel HMM-Based TTS System using Both Continuous HMMS and Discrete HMMS.
Proceedings of the IEEE International Conference on Acoustics, 2007

Expressive Face Animation Synthesis Based on Dynamic Mapping Method.
Proceedings of the Affective Computing and Intelligent Interaction, 2007

What Should a Generic Emotion Markup Language Be Able to Represent?
Proceedings of the Affective Computing and Intelligent Interaction, 2007

Combining Audio and Video by Dominance in Bimodal Emotion Recognition.
Proceedings of the Affective Computing and Intelligent Interaction, 2007

2006
Prosody conversion from neutral speech to emotional speech.
IEEE Trans. Speech Audio Process., 2006

Pitch Prediction for Mandarin TTS with Mutual Prosodic Constraint.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Nonlinear Emotional Prosody Generation and Annotation.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Prosodic Word Prediction Using a Maximum Entropy Approach.
Proceedings of the Chinese Spoken Language Processing, 5th International Symposium, 2006

Emotional Speech Analysis on Nonlinear Manifold.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Emotion Recognition from Noisy Speech.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

A New Pitch Generation Model Based on Internal Dependence of Pitch Contour for Manadrin TTS System.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Applying Pitch Target Model to Convert F0 Contour for Expressive Mandarin Speech Synthesis.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

2005
Chinese prosodic phrasing with a constraint-based approach.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Automatic 3D Face Modeling from Video.
Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV 2005), 2005

Dynamic Mapping Method Based Speech Driven Face Animation System.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

Affective Computing: A Review.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

Features Importance Analysis for Emotional Speech Classification.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

Personalized Facial Animation Based on 3D Model Fitting from Two Orthogonal Face Images.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

A Hybrid GMM and Codebook Mapping Method for Spectral Conversion.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

2004
F0 Prediction Model of Speech Synthesis Based on Template and Statistical Method.
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

Acoustic and Linguistic Information Based Chinese Prosodic Boundary Labelling.
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004

Multi-source based acoustic model for speech synthesis.
Proceedings of the Fifth ISCA ITRW on Speech Synthesis, 2004

Rhythm correlation of speech synthesis system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

Grapheme-to-phoneme conversion in Chinese TTS system.
Proceedings of the 2004 International Symposium on Chinese Spoken Language Processing, 2004

A new multicomponent AM-FM demodulation with predicting frequency boundaries and its application to formant estimation.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Context based emotion detection from text input.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

Emotional Chinese talking head system.
Proceedings of the 6th International Conference on Multimodal Interfaces, 2004

2003
Emotion control of Chinese speech synthesis in natural environment.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Chinese prosodic phrasing with extended features.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Auditive learning based Chinese F0 prediction.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002
Voice quality analysis under the pitch effect.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Automatic stress prediction of Chinese speech synthesis.
Proceedings of the 2002 International Symposium on Chinese Spoken Language Processing, 2002

Prosodic phrasing with inductive learning.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Clustering and feature learning based F0 prediction for Chinese speech synthesis.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Music type classification by spectral contrast feature.
Proceedings of the 2002 IEEE International Conference on Multimedia and Expo, 2002

Learning Rules for Chinese Prosodic Phrase Prediction.
Proceedings of the First Workshop on Chinese Language Processing, 2002

2000
Data-driven importance analysis of linguistic and phonetic information.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

1998
The Statistical Model of Chinese Word Contours Based on Fuzzy.
Proceedings of the 1998 International Symposium on Chinese Spoken Language Processing, 1998


  Loading...