2025

Where do Large Vision-Language Models Look at when Answering Questions?

[DOI]

,

,

,

,

,

,

,

,

CoRR, March, 2025

2024

DiffLM: Controllable Synthetic Data Generation via Diffusion Language Models.

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

AIPO: Improving Training Objective for Iterative Preference Optimization.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies.

[DOI]

,

,

,

,

,

,

CoRR, 2024

WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization.

[DOI]

,

,

,

,

CoRR, 2024

SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos.

[DOI]

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unveiling Narrative Reasoning Limits of Large Language Models with Trope in Movie Synopses.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos.

[DOI]

,

,

Hammad A. Ayyubi

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Beyond Grounding: Extracting Fine-Grained Event Hierarchies across Modalities.

[DOI]

Hammad A. Ayyubi

,

Christopher Thomas

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering.

[DOI]

,

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

Prompt-aligned Gradient for Prompt Tuning.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection.

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

In Defense of Structural Symbolic Representation for Video Event-Relation Prediction.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Debiased Fine-Tuning for Vision-Language Models by Prompt Regularization.

[DOI]

,

,

,

,

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Multimodal Event Graphs: Towards Event Centric Understanding of Multimodal World.

[DOI]

Hammad A. Ayyubi

,

Christopher Thomas

,

,

,

,

,

,

,

,

CoRR, 2022

Respecting Transfer Gap in Knowledge Distillation.

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On Non-Random Missing Labels in Semi-Supervised Learning.

[DOI]

,

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Interventional Training for Out-Of-Distribution Natural Language Understanding.

[DOI]

,

,

,

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Weakly-Supervised Temporal Article Grounding.

[DOI]

,

,

,

,

,

Christopher Thomas

,

Hammad A. Ayyubi

,

,

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

NICO Challenge: Out-of-Distribution Generalization for Image Recognition Challenges.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Explicit Image Caption Editing.

[DOI]

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs.

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross-Domain Empirical Risk Minimization for Unbiased Long-Tailed Classification.

[DOI]

,

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions.

[DOI]

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2021

Domain-Adaptive Few-Shot Learning.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Introspective Distillation for Robust Question Answering.

[DOI]

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Counterfactual VQA: A Cause-Effect Look at Language Bias.

[DOI]

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

COSY: COunterfactual SYntax for Cross-Lingual Understanding.

[DOI]

,

,

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

Counterfactual Variable Control for Robust and Interpretable Question Answering.

[DOI]

,

,

,

,

CoRR, 2020

Domain-Adaptive Few-Shot Learning.

[DOI]

,

,

,

,

,

,

,

CoRR, 2020

Lightweight Action Recognition in Compressed Videos.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Unbiased Scene Graph Generation From Biased Training.

[DOI]

,

,

Jianqiang Huang

,

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Two Causal Principles for Improving Visual Dialog.

[DOI]

,

,

Jianqiang Huang

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation.

[DOI]

,

,

,

,

IEEE Trans. Image Process., 2019

Mobile Video Action Recognition.

[DOI]

,

,

,

,

,

CoRR, 2019

Coarse-to-Fine Grained Classification.

[DOI]

,

,

,

,

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019

Recursive Visual Attention in Visual Dialog.

[DOI]

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Grounding Referring Expressions in Images by Variational Context.

[DOI]

,

,

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Graph-boosted convolutional neural networks for semantic segmentation.

[DOI]

,

,

,

,

,

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation.

[DOI]

,

,

,

,

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2015

Weakly Supervised Matrix Factorization for Noisily Tagged Image Parsing.

[DOI]

,

,

,

,

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015