A Systematic Survey of Automatic Prompt Optimization Techniques.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
Modality mixer exploiting complementary information for multi-modal action recognition.
Comput. Vis. Image Underst., 2025
Black-Box Visual Prompt Engineering for Mitigating Object Hallucination in Large Vision Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Diffusion Model Patching via Mixture-of-Prompts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Parameter Efficient Mamba Tuning via Projector-targeted Diagonal-centric Linear Transformation.
CoRR, 2024
RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs.
CoRR, 2024
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models.
CoRR, 2024
Sketch-based Video Object Localization.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
Denoising Task Routing for Diffusion Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts.
Proceedings of the Computer Vision - ECCV 2024, 2024
Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024
Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Tackling the Challenges in Scene Graph Generation With Local-to-Global Interactions.
IEEE Trans. Neural Networks Learn. Syst., December, 2023
Cross-modal alignment and translation for missing modality action recognition.
Comput. Vis. Image Underst., November, 2023
Modality Mixer for Multi-modal Action Recognition.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
AHFu-Net: Align, Hallucinate, and Fuse Network for Missing Multimodal Action Recognition.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2023
Multi-modal Social Group Activity Recognition in Panoramic Scene.
Proceedings of the IEEE International Conference on Visual Communications and Image Processing, 2023
Audio-Visual Glance Network for Efficient Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Towards Good Practices for Missing Modality Robust Action Recognition.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
Explore and Match: End-to-End Video Grounding with Transformer.
CoRR, 2022
Temporal Flow Mask Attention for Open-Set Long-Tailed Recognition of Wild Animals in Camera-Trap Images.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022
What and When to Look?: Temporal Span Proposal Network for Video Visual Relation Detection.
CoRR, 2021