2025

Transfer between Modalities with MetaQueries.

[DOI]

Xichen Pan

Satya Narayan Shukla

CoRR, April, 2025

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.

[DOI]

CoRR, March, 2025

2024

Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object Segmentation.

[DOI]

IEEE Trans. Image Process., 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

[DOI]

CoRR, 2024

Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models.

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Kosmos-G: Generating Images in Context with Multimodal Large Language Models.

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Image Sculpting: Precise Object Editing with 3D Geometry Control.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation.

[DOI]

CoRR, 2023

2022

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition.

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022