ImageNetVC: Zero-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories.
CoRR, 2023
ImageNetVC: Zero- and Few-Shot Visual Commonsense Evaluation on 1000 ImageNet Categories.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Denoising Bottleneck with Mutual Information Maximization for Video Multimodal Fusion.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Premise-based Multimodal Reasoning: A Human-like Cognitive Process.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2021