2025
Step1X-Edit: A Practical Framework for General Image Editing.
CoRR, April, 2025

Perception-R1: Pioneering Perception Policy with Reinforcement Learning.
CoRR, April, 2025

Perception in Reflection.
CoRR, April, 2025

Taming Teacher Forcing for Masked Autoregressive Video Generation.
CoRR, January, 2025

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.
IEEE Robotics Autom. Lett., July, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.
CoRR, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

GladCoder: Stylized QR Code Generation with Grayscale-Aware Denoising Process.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

WaterDiff: Perceptual Image Watermarks Via Diffusion Model.
Proceedings of the IEEE International Conference on Acoustics, 2024