2025
A Systematic Survey of Automatic Prompt Optimization Techniques.
CoRR, February, 2025

Modality mixer exploiting complementary information for multi-modal action recognition.
Comput. Vis. Image Underst., 2025

Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025

Diffusion Model Patching via Mixture-of-Prompts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation.
CoRR, 2024

RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs.
CoRR, 2024

Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models.
CoRR, 2024

Sketch-based Video Object Localization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Denoising Task Routing for Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts.
Proceedings of the Computer Vision - ECCV 2024, 2024

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Tackling the Challenges in Scene Graph Generation With Local-to-Global Interactions.
IEEE Trans. Neural Networks Learn. Syst., December, 2023

Cross-modal alignment and translation for missing modality action recognition.
Comput. Vis. Image Underst., November, 2023

Modality Mixer for Multi-modal Action Recognition.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

AHFu-Net: Align, Hallucinate, and Fuse Network for Missing Multimodal Action Recognition.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2023

Multi-modal Social Group Activity Recognition in Panoramic Scene.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2023

Audio-Visual Glance Network for Efficient Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards Good Practices for Missing Modality Robust Action Recognition.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Explore and Match: End-to-End Video Grounding with Transformer.
CoRR, 2022

Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

2021
What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection.
CoRR, 2021