2025
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition.
CoRR, January, 2025
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models.
CoRR, January, 2025
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer.
CoRR, January, 2025
Whole slide image based deep learning refines prognosis and therapeutic response evaluation in lung adenocarcinoma.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
npj Digit. Medicine, 2025
2024
Recurrent Generic Contour-Based Instance Segmentation With Progressive Learning.
IEEE Trans. Circuits Syst. Video Technol., September, 2024
HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation.
Int. J. Comput. Vis., July, 2024
Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding.
IEEE Trans. Multim., 2024
Deep Unrestricted Document Image Rectification.
IEEE Trans. Multim., 2024
Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.
,
,
,
,
,
,
,
,
,
,
IEEE ACM Trans. Audio Speech Lang. Process., 2024
Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion.
CoRR, 2024
Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction.
CoRR, 2024
Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.
CoRR, 2024
Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation.
CoRR, 2024
Described Spatial-Temporal Video Detection.
CoRR, 2024
One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model.
CoRR, 2024
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest.
CoRR, 2024
DeepEraser: Deep Iterative Context Mining for Generic Text Eraser.
CoRR, 2024
FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
End-to-End Automatic Singing Skill Evaluation Using Cross-Attention and Data Augmentation for Solo Singing and Singing With Accompaniment.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024
Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking With Self-Supervised Learning Features.
Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024
Towards Automatic Data Augmentation for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.
Proceedings of the IEEE International Conference on Acoustics, 2024
Agent3D-Zero: An Agent for Zero-Shot 3D Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024
End-to-End Rate-Distortion Optimized 3D Gaussian Representation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion.
Proceedings of the Computer Vision - ECCV 2024, 2024
Revisiting Open-Set Panoptic Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Cycle-Consistency Learning for Captioning and Grounding.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
TransVG++: End-to-End Visual Grounding With Language Conditioned Vision Transformer.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023
Multi-Modal 3D Object Detection in Autonomous Driving: A Survey.
Int. J. Comput. Vis., August, 2023
NRAP: Nearest Reliable Anchors-Based Wireless Positioning for Irregular Multi-hop Networks.
Wirel. Pers. Commun., April, 2023
Masked Contrastive Representation Learning for Reinforcement Learning.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2023
VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion.
IEEE Trans. Multim., 2023
FI-WSOD: Foreground Information Guided Weakly Supervised Object Detection.
IEEE Trans. Multim., 2023
Hybrid Motion Representation Learning for Prediction From Raw Sensor Data.
IEEE Trans. Multim., 2023
Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection.
Int. J. Comput. Vis., 2023
FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin.
CoRR, 2023
I<sup>2</sup>MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation.
CoRR, 2023
Recurrent Contour-based Instance Segmentation with Progressive Learning.
CoRR, 2023
OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection.
CoRR, 2023
CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Fed-CoT: Co-teachers for Federated Semi-supervised MS Lesion Segmentation.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 Workshops, 2023
Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Use of Speech Impairment Severity for Dysarthric Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Masked Motion Predictors are Strong 3D Action Representation Learners.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer's Disease Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023
Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
Weakly Supervised Temporal Adjacent Network for Language Grounding.
IEEE Trans. Multim., 2022
Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2022
${\mathsf{EZFusion}}$: A Close Look at the Integration of LiDAR, Millimeter-Wave Radar, and Camera for Accurate 3D Object Detection and Tracking.
IEEE Robotics Autom. Lett., 2022
Intelligent Spraying Water Based on the Internet of Orchard Things and Fuzzy PID Algorithms.
,
,
,
,
,
,
,
,
,
,
,
J. Sensors, 2022
Development and validation of a deep learning model to predict the survival of patients in ICU.
J. Am. Medical Informatics Assoc., 2022
Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.
CoRR, 2022
Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer.
Proceedings of the International Joint Conference on Neural Networks, 2022
Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022
CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation.
Proceedings of the Computer Vision - ECCV 2022, 2022
Geometric Representation Learning for Document Image Rectification.
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
Residual Refinement Network with Attribute Guidance for Precise Saliency Detection.
ACM Trans. Multim. Comput. Commun. Appl., 2021
Single Shot Video Object Detector.
IEEE Trans. Multim., 2021
MINet: Meta-Learning Instance Identifiers for Video Object Detection.
IEEE Trans. Image Process., 2021
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection.
IEEE Trans. Circuits Syst. Video Technol., 2021
DocScanner: Robust Document Image Rectification with Progressive Learning.
CoRR, 2021
DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
TransVG: End-to-End Visual Grounding with Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Instance Mining with Class Feature Banks for Weakly Supervised Object Detection.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Masked Contrastive Representation Learning for Reinforcement Learning.
CoRR, 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching.
Proceedings of the Computer Vision - ECCV 2020, 2020
2019
Symbol-level Precoding for Multiuser Visible Light Communication.
Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, 2019
Self-Reproducing Video Frame Interpolation.
Proceedings of the 2nd IEEE Conference on Multimedia Information Processing and Retrieval, 2019
Relation Distillation Networks for Video Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Graph-Based Scheduling for Cooperative Transmission in Indoor VLC Systems.
Proceedings of the 17th IEEE International Conference on Communications Workshops, 2019
2013
Information theory based region of interest extraction scheme with perceptual stimulus-response model.
Proceedings of the 24th IEEE Annual International Symposium on Personal, 2013