2025

Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition.

[DOI]

Huimeng Wang

Xurong Xie

CoRR, January, 2025

Effective and Efficient Mixed Precision Quantization of Speech Foundation Models.

[DOI]

CoRR, January, 2025

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer.

[DOI]

CoRR, January, 2025

Whole slide image based deep learning refines prognosis and therapeutic response evaluation in lung adenocarcinoma.

[DOI]

npj Digit. Medicine, 2025

2024

Recurrent Generic Contour-Based Instance Segmentation With Progressive Learning.

[DOI]

IEEE Trans. Circuits Syst. Video Technol., September, 2024

HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation.

[DOI]

Int. J. Comput. Vis., July, 2024

Guest Editorial Introduction to the Issue on Pre-Trained Models for Multi-Modality Understanding.

[DOI]

IEEE Trans. Multim., 2024

Deep Unrestricted Document Image Rectification.

[DOI]

IEEE Trans. Multim., 2024

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Self-Supervised ASR Models and Features for Dysarthric and Elderly Speech Recognition.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition.

[DOI]

CoRR, 2024

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion.

[DOI]

CoRR, 2024

Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction.

[DOI]

CoRR, 2024

Exploring SSL Discrete Speech Features for Zipformer-based Contextual ASR.

[DOI]

CoRR, 2024

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation.

[DOI]

CoRR, 2024

Described Spatial-Temporal Video Detection.

[DOI]

CoRR, 2024

One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model.

[DOI]

CoRR, 2024

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition.

[DOI]

CoRR, 2024

Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask.

[DOI]

CoRR, 2024

PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest.

[DOI]

CoRR, 2024

DeepEraser: Deep Iterative Context Mining for Generic Text Eraser.

[DOI]

CoRR, 2024

FARFusion V2: A Geometry-based Radar-Camera Fusion Method on the Ground for Roadside Far-Range 3D Object Detection.

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies.

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

End-to-End Automatic Singing Skill Evaluation Using Cross-Attention and Data Augmentation for Solo Singing and Singing With Accompaniment.

[DOI]

Yaolong Ju

Chun Yat Wu

Betty Cortiñas-Lorenzo

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking With Self-Supervised Learning Features.

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Towards Automatic Data Augmentation for Disordered Speech Recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards High-Performance and Low-Latency Feature-Based Speaker Adaptation of Conformer Speech Recognition Systems.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Agent3D-Zero: An Agent for Zero-Shot 3D Understanding.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

End-to-End Rate-Distortion Optimized 3D Gaussian Representation.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Hierarchical Temporal Context Learning for Camera-Based Semantic Scene Completion.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Revisiting Open-Set Panoptic Segmentation.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Cycle-Consistency Learning for Captioning and Grounding.

[DOI]

Ning Wang

Jiajun Deng

Mingbo Jia

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

TransVG++: End-to-End Visual Grounding With Language Conditioned Vision Transformer.

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Multi-Modal 3D Object Detection in Autonomous Driving: A Survey.

[DOI]

Int. J. Comput. Vis., August, 2023

NRAP: Nearest Reliable Anchors-Based Wireless Positioning for Irregular Multi-hop Networks.

[DOI]

Wirel. Pers. Commun., April, 2023

Masked Contrastive Representation Learning for Reinforcement Learning.

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2023

VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion.

[DOI]

IEEE Trans. Multim., 2023

FI-WSOD: Foreground Information Guided Weakly Supervised Object Detection.

[DOI]

IEEE Trans. Multim., 2023

Hybrid Motion Representation Learning for Prediction From Raw Sensor Data.

[DOI]

IEEE Trans. Multim., 2023

Audio-Visual End-to-End Multi-Channel Speech Separation, Dereverberation and Recognition.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Confidence Score Based Speaker Adaptation of Conformer Speech Recognition Systems.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection.

[DOI]

Int. J. Comput. Vis., 2023

FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin.

[DOI]

CoRR, 2023

I<sup>2</sup>MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation.

[DOI]

CoRR, 2023

Recurrent Contour-based Instance Segmentation with Progressive Learning.

[DOI]

CoRR, 2023

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection.

[DOI]

CoRR, 2023

CluB: Cluster Meets BEV for LiDAR-Based 3D Object Detection.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fed-CoT: Co-teachers for Federated Semi-supervised MS Lesion Segmentation.

[DOI]

Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2023 Workshops, 2023

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Lossless 4-bit Quantization of Architecture Compressed Conformer ASR Systems on the 300-hr Switchboard Corpus.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploiting Cross-Domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Use of Speech Impairment Severity for Dysarthric Speech Recognition.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Factorised Speaker-environment Adaptive Training of Conformer Speech Recognition Systems.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Towards Effective and Compact Contextual Representation for Conformer Transducer Speech Recognition Systems.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Invariant Training 2D-3D Joint Hard Samples for Few-Shot Point Cloud Recognition.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Masked Motion Predictors are Strong 3D Action Representation Learners.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SimFIR: A Simple Framework for Fisheye Image Rectification with Self-supervised Representation Learning.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploiting Prompt Learning with Pre-Trained Language Models for Alzheimer's Disease Detection.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Exploring Self-Supervised Pre-Trained ASR Models for Dysarthric and Elderly Speech Recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Weakly Supervised Temporal Adjacent Network for Language Grounding.

[DOI]

IEEE Trans. Multim., 2022

Neural Architecture Search for LF-MMI Trained Time Delay Neural Networks.

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

${\mathsf{EZFusion}}$: A Close Look at the Integration of LiDAR, Millimeter-Wave Radar, and Camera for Accurate 3D Object Detection and Tracking.

[DOI]

IEEE Robotics Autom. Lett., 2022

Intelligent Spraying Water Based on the Internet of Orchard Things and Fuzzy PID Algorithms.

[DOI]

J. Sensors, 2022

Development and validation of a deep learning model to predict the survival of patients in ICU.

[DOI]

J. Am. Medical Informatics Assoc., 2022

Exploiting Cross-domain And Cross-Lingual Ultrasound Tongue Imaging Features For Elderly And Dysarthric Speech Recognition.

[DOI]

CoRR, 2022

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Confidence Score Based Conformer Speaker Adaptation for Speech Recognition.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Two-pass Decoding and Cross-adaptation Based System Combination of End-to-end Conformer and Hybrid TDNN ASR Systems.

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Pulmonary Nodule Classification with Multi-View Convolutional Vision Transformer.

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2022

Audio-Visual Multi-Channel Speech Separation, Dereverberation and Recognition.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Geometric Representation Learning for Document Image Rectification.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

Residual Refinement Network with Attribute Guidance for Precise Saliency Detection.

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., 2021

Single Shot Video Object Detector.

[DOI]

IEEE Trans. Multim., 2021

MINet: Meta-Learning Instance Identifiers for Video Object Detection.

[DOI]

IEEE Trans. Image Process., 2021

From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection.

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2021

DocScanner: Robust Document Image Rectification with Progressive Learning.

[DOI]

CoRR, 2021

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction.

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting.

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Bayesian Parametric and Architectural Domain Adaptation of LF-MMI Trained TDNNs for Elderly and Dysarthric Speech Recognition.

[DOI]

Jiajun Deng

Fabian Ritter Gutierrez

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TransVG: End-to-End Visual Grounding with Transformers.

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Instance Mining with Class Feature Banks for Weakly Supervised Object Detection.

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection.

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Masked Contrastive Representation Learning for Reinforcement Learning.

[DOI]

CoRR, 2020

Adaptive Offline Quintuplet Loss for Image-Text Matching.

[DOI]

Tianlang Chen

Jiajun Deng

Jiebo Luo

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Symbol-level Precoding for Multiuser Visible Light Communication.

[DOI]

Proceedings of the 11th International Conference on Wireless Communications and Signal Processing, 2019

Self-Reproducing Video Frame Interpolation.

[DOI]

Proceedings of the 2nd IEEE Conference on Multimedia Information Processing and Retrieval, 2019

Relation Distillation Networks for Video Object Detection.

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Graph-Based Scheduling for Cooperative Transmission in Indoor VLC Systems.

[DOI]

Proceedings of the 17th IEEE International Conference on Communications Workshops, 2019

2013

Information theory based region of interest extraction scheme with perceptual stimulus-response model.

[DOI]

Proceedings of the 24th IEEE Annual International Symposium on Personal, 2013