2025
Where do Large Vision-Language Models Look at when Answering Questions?
CoRR, March, 2025
2024
DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models.
CoRR, 2024
AIPO: Improving Training Objective for Iterative Preference Optimization.
CoRR, 2024
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.
CoRR, 2024
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization.
CoRR, 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024
Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023
Prompt-aligned Gradient for Prompt Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
In Defense of Structural Symbolic Representation for Video Event-Relation Prediction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World.
CoRR, 2022
Respecting Transfer Gap in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
On Non-Random Missing Labels in Semi-Supervised Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Interventional Training for Out-Of-Distribution Natural Language Understanding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Weakly-Supervised Temporal Article Grounding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
NICO Challenge: Out-of-Distribution Generalization for Image Recognition Challenges.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022
Explicit Image Caption Editing.
Proceedings of the Computer Vision - ECCV 2022, 2022
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions.
IEEE Trans. Pattern Anal. Mach. Intell., 2021
Domain-Adaptive Few-Shot Learning.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021
Introspective Distillation for Robust Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Counterfactual VQA: A Cause-Effect Look at Language Bias.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
COSY: COunterfactual SYntax for Cross-Lingual Understanding.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
2020
Counterfactual Variable Control for Robust and Interpretable Question Answering.
CoRR, 2020
Domain-Adaptive Few-Shot Learning.
CoRR, 2020
Lightweight Action Recognition in Compressed Videos.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020
Unbiased Scene Graph Generation From Biased Training.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Two Causal Principles for Improving Visual Dialog.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation.
IEEE Trans. Image Process., 2019
Mobile Video Action Recognition.
CoRR, 2019
Coarse-to-Fine Grained Classification.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019
Recursive Visual Attention in Visual Dialog.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
Grounding Referring Expressions in Images by Variational Context.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
2017
Graph-boosted convolutional neural networks for semantic segmentation.
Proceedings of the 2017 International Joint Conference on Neural Networks, 2017
FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2015
Weakly Supervised Matrix Factorization for Noisily Tagged Image Parsing.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015