2025
Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling.
CoRR, January, 2025

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Pruning Self-Attentions Into Convolutional Layers in Single Path.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding.
CoRR, 2024

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation.
CoRR, 2024

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.
CoRR, 2024

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Stitched ViTs are Flexible Vision Backbones.
Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Stitchable Task Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
A Survey on Efficient Training of Transformers.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Stitchable Neural Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Focus-aware Positional Queries for Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Fast Vision Transformers with HiLo Attention.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EcoFormer: Energy-Saving Attention with Linear Complexity.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Less Is More: Pay Less Attention in Vision Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Mesa: A Memory-saving Training Framework for Transformers.
CoRR, 2021

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
CoRR, 2021

Scalable Visual Transformers with Hierarchical Pooling.
CoRR, 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Scalable Vision Transformers with Hierarchical Pooling.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Object-and-Action Aware Model for Visual Language Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020