Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling.
CoRR, January, 2025
T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Pruning Self-Attentions Into Convolutional Layers in Single Path.
IEEE Trans. Pattern Anal. Mach. Intell., May, 2024
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Stitched ViTs are Flexible Vision Backbones.
Proceedings of the Computer Vision - ECCV 2024, 2024
Efficient Stitchable Task Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
A Survey on Efficient Training of Transformers.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Stitchable Neural Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Dynamic Focus-aware Positional Queries for Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Fast Vision Transformers with HiLo Attention.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
EcoFormer: Energy-Saving Attention with Linear Complexity.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
An Efficient Spatio-Temporal Pyramid Transformer for Action Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022
Less Is More: Pay Less Attention in Vision Transformers.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
Mesa: A Memory-saving Training Framework for Transformers.
CoRR, 2021
Know What and Know Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
CoRR, 2021
Scalable Visual Transformers with Hierarchical Pooling.
CoRR, 2021
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Scalable Vision Transformers with Hierarchical Pooling.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Object-and-Action Aware Model for Visual Language Navigation.
Proceedings of the Computer Vision - ECCV 2020, 2020