2025

Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding.

[DOI]

Tao Zhang

Xiangtai Li

CoRR, April, 2025

The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer.

[DOI]

CoRR, April, 2025

Grounding Multimodal Large Language Model in GUI World.

[DOI]

Weixian Lei

Difei Gao

Mike Zheng Shou

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

ShowUI: One Vision-Language-Action Model for GUI Visual Agent.

[DOI]

CoRR, 2024

VIT-LENS: Towards Omni-modal Representations.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification.

[DOI]

IEEE J. Biomed. Health Informatics, April, 2023

ViT-Lens-2: Gateway to Omni-modal Intelligence.

[DOI]

CoRR, 2023

ViT-Lens: Towards Omni-modal Representations.

[DOI]

CoRR, 2023

Learning to Learn: How to Continuously Teach Humans and Machines.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.

[DOI]

Muhammet Furkan Ilaslan

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation.

[DOI]

CoRR, 2022

Learning to Learn: How to Continuously Teach Humans and Machines.

[DOI]

CoRR, 2022

PCCT: Progressive Class-Center Triplet Loss for Imbalanced Medical Image Classification.

[DOI]

CoRR, 2022

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2020

Class-Center Involved Triplet Loss for Skin Disease Classification on Imbalanced Data.

[DOI]

Proceedings of the 17th IEEE International Symposium on Biomedical Imaging, 2020