Step1X-Edit: A Practical Framework for General Image Editing.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
Perception-R1: Pioneering Perception Policy with Reinforcement Learning.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
Perception in Reflection.
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
Taming Teacher Forcing for Masked Autoregressive Video Generation.
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.
IEEE Robotics Autom. Lett., July, 2024
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
GladCoder: Stylized QR Code Generation with Grayscale-Aware Denoising Process.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
DreamLLM: Synergistic Multimodal Comprehension and Creation.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Twelfth International Conference on Learning Representations, 2024
WaterDiff: Perceptual Image Watermarks Via Diffusion Model.
Proceedings of the IEEE International Conference on Acoustics, 2024