DINOv2: Learning Robust Visual Features without Supervision.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Trans. Mach. Learn. Res., 2024
DataComp-LM: In search of the next generation of training sets for language models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Scalable Pre-training of Large Autoregressive Image Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
ResMLP: Feedforward Networks for Image Classification With Data-Efficient Training.
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Pattern Anal. Mach. Intell., April, 2023
Image Compression with Product Quantized Masked Image Modeling.
Trans. Mach. Learn. Res., 2023
Are Visual Recognition Models Robust to Image Compression?
CoRR, 2023
Improving Statistical Fidelity for Neural Image Compression with Implicit Local Likelihood Models.
Proceedings of the International Conference on Machine Learning, 2023
Variable Rate Allocation for Vector-Quantized Autoencoders.
Proceedings of the IEEE International Conference on Acoustics, 2023
OmniMAE: Single Model Masked Pretraining on Images and Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
ImageBind One Embedding Space to Bind Them All.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Three Things Everyone Should Know About Vision Transformers.
Proceedings of the Computer Vision, 2022
Augmenting Convolutional networks with attention-based aggregation.
CoRR, 2021
Are Large-scale Datasets Necessary for Self-Supervised Pre-training?
CoRR, 2021
XCiT: Cross-Covariance Image Transformers.
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
ResMLP: Feedforward networks for image classification with data-efficient training.
CoRR, 2021
Training Vision Transformers for Image Retrieval.
CoRR, 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Skip-Clip: Self-Supervised Spatiotemporal Representation Learning by Future Clip Order Ranking.
CoRR, 2019
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Keep Drawing It: Iterative language-based image generation and editing.
CoRR, 2018
Real-Time End-to-End Action Detection with Two-Stream Networks.
Proceedings of the 15th Conference on Computer and Robot Vision, 2018