2025
LiveVQA: Live Visual Knowledge Seeking.
CoRR, April, 2025

GMValuator: Similarity-based Data Valuation for Generative Models.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

Interleaved Scene Graphs for Interleaved Text-and-Image Generation Assessment.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment.
CoRR, 2024

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model.
CoRR, 2024

Efficient Inference of Vision Instruction-Following Models with Elastic Cache.
Proceedings of the Computer Vision - ECCV 2024, 2024

2023
Matching-based Data Valuation for Generative Model.
CoRR, 2023

Unleashing Text-to-Image Diffusion Models for Visual Perception.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2021
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

RandomRooms: Unsupervised Pre-training from Synthetic Shapes and Randomized Layouts for 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Robust Object Detection via Instance-Level Temporal Cycle Confusion.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Proxy Wasserstein Classifier for Image Classification.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
MetaDistiller: Network Self-Boosting via Meta-Learned Top-Down Distillation.
Proceedings of the Computer Vision - ECCV 2020, 2020