Transfer between Modalities with MetaQueries.
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop.
CoRR, March, 2025
Learning Temporal Distribution and Spatial Correlation Toward Universal Moving Object Segmentation.
IEEE Trans. Image Process., 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Kosmos-G: Generating Images in Context with Multimodal Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Image Sculpting: Precise Object Editing with 3D Geometry Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Learning Temporal Distribution and Spatial Correlation for Universal Moving Object Segmentation.
CoRR, 2023
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022