Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
ATV3D: 3D Object Detection from Attention-based Three-view Representation.
Proceedings of the International Joint Conference on Neural Networks, 2024
Enhancing Visual Wake Word Spotting with Pretrained Model and Feature Balance Scaling.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Common Sense Language-Guided Exploration and Hierarchical Dense Perception for Instruction Following Embodied Agents.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
A Method of Audio-Visual Person Verification by Mining Connections between Time Series.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Learning Audio-Visual embedding for Wild Person Verification.
CoRR, 2022
MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
ICPR 2022 Challenge on Multi-Modal Subtitle Recognition.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 26th International Conference on Pattern Recognition, 2022
Fake Audio Detection Based On Unsupervised Pretraining Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition.
CoRR, 2021
VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge.
CoRR, 2021
The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Investigating the Use of Mixed-Units Based Modeling for Improving Uyghur Speech Recognition.
Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018