We stand with Ukraine

We stand with Ukraine

Jiaen Liang

Orcid: 0009-0001-8309-1301

According to our database¹, Jiaen Liang authored at least 46 papers between 2006 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

RAG-Guided Large Language Models for Visual Spatial Description with Adaptive Hallucination Corrector.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Temporal-Informative Adapters in VideoMAE V2 and Multi-Scale Feature Fusion for Micro-Expression Spotting-then-Recognize.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

End-to-end Spatio-Temporal Information Aggregation For Micro-Action Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Building Robust Video-Level Deepfake Detection via Audio-Visual Local-Global Interactions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Micro-Expression Spotting Based on Optical Flow Feature with Boundary Calibration.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Efficient Feature Extraction and Late Fusion Strategy for Audiovisual Emotional Mimicry Intensity Estimation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Multi Model Ensemble for Compound Expression Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AUD-TGN: Advancing Action Unit Detection with Temporal Convolution and GPT-2 in Wild Audiovisual Contexts.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Improving Valence-Arousal Estimation with Spatiotemporal Relationship Learning and Multimodal Fusion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Exploring Facial Expression Recognition through Semi-Supervised Pre-training and Temporal Modeling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Dual-model self-regularization and fusion for domain adaptation of robust speaker verification.

[BibT_eX]

[DOI]

,

,

Speech Commun., November, 2023

M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

MMT-GD: Multi-Modal Transformer with Graph Distillation for Cross-Cultural Humor Detection.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Sliding Window Seq2seq Modeling for Engagement Estimation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Answer-Based Entity Extraction and Alignment for Visual Text Question Answering.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 31st ACM International Conference on Multimedia, 2023

M<sup>2</sup>-CTTS: End-to-End Multi-Scale Multi-Modal Conversational Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Acoustic domain mismatch compensation in bird audio detection.

[BibT_eX]

[DOI]

,

,

,

Int. J. Speech Technol., 2022

Exploring single channel speech separation for short-time text-dependent speaker verification.

[BibT_eX]

[DOI]

,

,

,

Int. J. Speech Technol., 2022

Joint framework with deep feature distillation and adaptive focal loss for weakly supervised audio tagging and acoustic event detection.

[BibT_eX]

[DOI]

,

,

,

,

Digit. Signal Process., 2022

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

Selective Pseudo-labeling and Class-wise Discriminative Fusion for Sound Event Detection.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Joint Weakly Supervised AT and AED Using Deep Feature Distillation and Adaptive Focal Loss.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

Attention-Based Scaling Adaptation for Target Speech Extraction.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

CNN-based Discriminative Training for Domain Compensation in Acoustic Event Detection with Frame-wise Classifier.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Mask-based blind source separation and MVDR beamforming in ASR.

[BibT_eX]

[DOI]

,

,

,

Int. J. Speech Technol., 2020

Attention-based scaling adaptation for target speech extraction.

[BibT_eX]

[DOI]

,

,

CoRR, 2020

Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-Based LVCSR.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speech Driven Talking Head Generation via Attentional Landmarks Based Representation.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

The SHNU System for Blizzard Challenge 2020.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

2019

Speaker Direction-of-Arrival Estimation Based on Orthogonal Dipoles.

[BibT_eX]

[DOI]

,

,

Zhaoqiong Huang

,

,

,

,

Circuits Syst. Signal Process., 2019

2018

Active Learning for LF-MMI Trained Neural Networks in ASR.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Frequency-invariant differential microphone array design in the STFT domain.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2011

Exploring nuisance attribute projection and score normalization for GLDS-SVM based automatic mispronunciation detection method.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2011

2010

Exploring goodness of prosody by diverse matching templates.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Automatic reference independent evaluation of prosody quality using multiple knowledge fusions.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

2009

High performance automatic mispronunciation detection method based on neural network and TRAP features.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

An efficient mispronounciation detction method using GLDS-SVM and formant enhanced features.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2009

Context Dependent Feature Based Bottom-up Rescoring SVM Classifier in Children's English Stress Mis-pronunciation Detection.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 9th IEEE International Conference on Advanced Learning Technologies, 2009

2008

Improving searching speed and accuracy of query by humming system based on three methods: feature fusion, candidates set reduction and multiple similarity measurement rescoring.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Music Genre Classification Based on Multiple Classifier Fusion.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Fourth International Conference on Natural Computation, 2008

Improved phonotactic language identification using random forest language models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

A Novel Phone-State Matrix Based Vocabulary-Indenendent Keyword Spotting Method for Spontaneous Speech.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Full Utilization of Closed-captions in Broadcast News Recognition.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

An Improved Mandarin Keyword Spotting System Using MCE Training and Context-Enhanced Verification.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Loading...