2024
DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments.
CoRR, 2024

SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining.
CoRR, 2024

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic.
CoRR, 2024

MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

PMI Sampler: Patch Similarity Guided Frame Selection For Aerial Action Recognition.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

AGL-Net: Aerial-Ground Cross-Modal Global Localization with Varying Scales.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

SCP: Soft Conditional Prompt Learning for Aerial Video Action Recognition.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024

AutoHallusion: Automatic Generation of Hallucination Benchmarks for Vision-Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

ViLA: Efficient Video-Language Alignment for Video Question Answering.
Proceedings of the Computer Vision - ECCV 2024, 2024

Hallusionbench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ICAR: Image-Based Complementary Auto Reasoning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering.
CoRR, 2023

Triplet Knowledge Distillation.
CoRR, 2023

Prompt Learning for Action Recognition.
CoRR, 2023

AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Small-shot Multi-modal Distillation for Vision-based Autonomous Steering.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

METEOR: A Dense, Heterogeneous, and Unstructured Traffic Dataset with Rare Behaviors.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Auxiliary Modality Learning with Generalized Curriculum Distillation.
Proceedings of the International Conference on Machine Learning, 2023

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
Fourier Disentangled Space-Time Attention for Aerial Video Recognition.
CoRR, 2022

FAR: Fourier Aerial Video Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Dynamic Region-Aware Convolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019
Fully Learnable Group Convolution for Acceleration of Deep Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019