Saining Xie

According to our database1, Saining Xie authored at least 57 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think.
CoRR, 2024

DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing.
CoRR, 2024

AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark.
CoRR, 2024

On Scaling Up 3D Gaussian Splatting Training.
CoRR, 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs.
CoRR, 2024

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning.
CoRR, 2024

Deconstructing Denoising Diffusion Models for Self-Supervised Learning.
CoRR, 2024

What Does a Visual Formal Analysis of the World's 500 Most Famous Paintings Tell Us About Multimodal LLMs?
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Demystifying CLIP Data.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Altogether: Image Captioning via Re-aligning Alt-text.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

V-IRL: Grounding Virtual Intelligence in Real Life.
Proceedings of the Computer Vision - ECCV 2024, 2024

SiT: Exploring Flow and Diffusion-Based Generative Models with Scalable Interpolant Transformers.
Proceedings of the Computer Vision - ECCV 2024, 2024

Fast Encoding and Decoding for Implicit Video Representation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Image Sculpting: Precise Object Editing with 3D Geometry Control.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MoDE: CLIP Data Experts via Clustering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Going Denser with Open-Vocabulary Part Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scalable Diffusion Models with Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

CiT: Curation in Training for Effective Vision-Language Data.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Sample-Efficient Neural Architecture Search by Learning Actions for Monte Carlo Tree Search.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Exploring Long-Sequence Masked Autoencoders.
CoRR, 2022

SLIP: Self-supervision Meets Language-Image Pre-training.
Proceedings of the Computer Vision - ECCV 2022, 2022

Masked Autoencoders Are Scalable Vision Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Feature Prediction for Self-Supervised Visual Pre-Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

A ConvNet for the 2020s.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision.
CoRR, 2021

Benchmarking Detection Transfer Learning with Vision Transformers.
CoRR, 2021

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Pri3D: Can 3D Priors Help 2D Representation Learning?
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

An Empirical Study of Training Self-Supervised Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Data-Efficient 3D Scene Understanding With Contrastive Scene Contexts.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Graph Structure of Neural Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

Decoupling Representation and Classifier for Long-Tailed Recognition.
Proceedings of the 8th International Conference on Learning Representations, 2020

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding.
Proceedings of the Computer Vision - ECCV 2020, 2020

Are Labels Necessary for Neural Architecture Search?
Proceedings of the Computer Vision - ECCV 2020, 2020

FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Momentum Contrast for Unsupervised Visual Representation Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Sample-Efficient Neural Architecture Search by Learning Action Space.
CoRR, 2019

Exploring Randomly Wired Neural Networks for Image Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

On Network Design Spaces for Visual Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Order-Aware Generative Modeling Using the 3D-Craft Dataset.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Deep Representation Learning with Induced Structural Priors.
PhD thesis, 2018

Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification.
Proceedings of the Computer Vision - ECCV 2018, 2018

Attentional ShapeContextNet for Point Cloud Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Holistically-Nested Edge Detection.
Int. J. Comput. Vis., 2017

Rethinking Spatiotemporal Feature Learning For Video Understanding.
CoRR, 2017

Aggregated Residual Transformations for Deep Neural Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Top-Down Learning for Structured Labeling with Convolutional Pseudoprior.
Proceedings of the Computer Vision - ECCV 2016, 2016

2015
Convolutional Pseudo-Prior for Structured Labeling.
CoRR, 2015

Hyper-class augmented and regularized deep learning for fine-grained image classification.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Deeply-Supervised Nets.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Pairwise constrained concept factorization for data representation.
Neural Networks, 2014

Semi-supervised non-negative matrix factorization for image clustering with graph Laplacian.
Multim. Tools Appl., 2014

2013
Perception Preserving Projections.
Proceedings of the British Machine Vision Conference, 2013

2012
Multi-task co-clustering via nonnegative matrix factorization.
Proceedings of the 21st International Conference on Pattern Recognition, 2012


  Loading...