2025

Step1X-Edit: A Practical Framework for General Image Editing.

[DOI]

Shiyu Liu

Yucheng Han

CoRR, April, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.

[DOI]

CoRR, April, 2025

Perception in Reflection.

[DOI]

CoRR, April, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.

[DOI]

CoRR, January, 2025

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.

[DOI]

IEEE Robotics Autom. Lett., July, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.

[DOI]

CoRR, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

GladCoder: Stylized QR Code Generation with Grayscale-Aware Denoising Process.

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

WaterDiff: Perceptual Image Watermarks Via Diffusion Model.

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024