2024

Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition.

[DOI]

Qijie Shao

Pengcheng Guo

Jinghao Yan

Pengfei Hu

Lei Xie

IEEE ACM Trans. Audio Speech Lang. Process., 2024

ATV3D: 3D Object Detection from Attention-based Three-view Representation.

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2024

Enhancing Visual Wake Word Spotting with Pretrained Model and Feature Balance Scaling.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Common Sense Language-Guided Exploration and Hierarchical Dense Perception for Instruction Following Embodied Agents.

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

2023

A Method of Audio-Visual Person Verification by Mining Connections between Time Series.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

Learning Audio-Visual embedding for Wild Person Verification.

[DOI]

CoRR, 2022

MFA-Conformer: Multi-scale Feature Aggregation Conformer for Automatic Speaker Verification.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-Modeling Unit Training for Phonetic-Reduction-Robust E2E Speech Recognition.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ICPR 2022 Challenge on Multi-Modal Subtitle Recognition.

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

Fake Audio Detection Based On Unsupervised Pretraining Models.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

PM-MMUT: Boosted Phone-mask Data Augmentation using Multi-modeing Unit Training for Robust Uyghur E2E Speech Recognition.

[DOI]

CoRR, 2021

VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge.

[DOI]

CoRR, 2021

The TNT Team System Descriptions of Cantonese and Mongolian for IARPA OpenASR20.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition.

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2019

Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin.

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018

Investigating the Use of Mixed-Units Based Modeling for Improving Uyghur Speech Recognition.

[DOI]

Pengfei Hu

Shen Huang

Zhiqiang Lv

Proceedings of the 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages, 2018