2025
Transfer between Modalities with MetaQueries.
CoRR, April, 2025

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.
CoRR, March, 2025

2024
Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object Segmentation.
IEEE Trans. Image Process., 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
CoRR, 2024

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Kosmos-G: Generating Images in Context with Multimodal Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Image Sculpting: Precise Object Editing with 3D Geometry Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation.
CoRR, 2023

2022
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022