2025

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling.

[DOI]

Xiaokang Chen

Zhiyu Wu

CoRR, January, 2025

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction.

[DOI]

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Pruning Self-Attentions Into Convolutional Layers in Single Path.

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding.

[DOI]

CoRR, 2024

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation.

[DOI]

CoRR, 2024

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.

[DOI]

CoRR, 2024

MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Stitched ViTs are Flexible Vision Backbones.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Stitchable Task Adaptation.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

A Survey on Efficient Training of Transformers.

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Stitchable Neural Networks.

[DOI]

Zizheng Pan

Jianfei Cai

Bohan Zhuang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Focus-aware Positional Queries for Semantic Segmentation.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Fast Vision Transformers with HiLo Attention.

[DOI]

Zizheng Pan

Jianfei Cai

Bohan Zhuang

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EcoFormer: Energy-Saving Attention with Linear Complexity.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Less Is More: Pay Less Attention in Vision Transformers.

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Mesa: A Memory-saving Training Framework for Transformers.

[DOI]

CoRR, 2021

Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.

[DOI]

CoRR, 2021

Scalable Visual Transformers with Hierarchical Pooling.

[DOI]

CoRR, 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Scalable Vision Transformers with Hierarchical Pooling.

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Object-and-Action Aware Model for Visual Language Navigation.

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020