Rohit Girdhar

According to our database1, Rohit Girdhar authored at least 44 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Human Action Anticipation: A Survey.
CoRR, 2024

Movie Gen: A Cast of Media Foundation Models.
CoRR, 2024

Factorizing Text-to-Video Generation by Explicit Image Conditioning.
Proceedings of the Computer Vision - ECCV 2024, 2024

Generating Illustrated Instructions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

InstanceDiffusion: Instance-Level Control for Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Motion-Conditioned Image Animation for Video Editing.
CoRR, 2023

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning.
CoRR, 2023

Learning to Substitute Ingredients in Recipes.
CoRR, 2023

What You Say Is What You Show: Visual Narration Detection in Instructional Videos.
CoRR, 2023

The effectiveness of MAE pre-pretraining for billion-scale pretraining.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OmniMAE: Single Model Masked Pretraining on Images and Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ImageBind One Embedding Space to Bind Them All.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

HierVL: Learning Hierarchical Video-Language Embeddings.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Cut and Learn for Unsupervised Object Detection and Instance Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Video Representations from Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Detecting Twenty-Thousand Classes Using Image-Level Supervision.
Proceedings of the Computer Vision - ECCV 2022, 2022


Omnivore: A Single Model for Many Visual Modalities.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked-attention Mask Transformer for Universal Image Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Mask2Former for Video Instance Segmentation.
CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

Physical Reasoning Using Dynamics-Aware Models.
CoRR, 2021

Self-Supervised Pretraining of 3D Features on any Point-Cloud.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

An End-to-End Transformer Model for 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Anticipative Video Transformer.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

3D Spatial Recognition Without Spatially Labeled 3D.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Forward Prediction for Physical Reasoning.
CoRR, 2020

MetaPix: Few-Shot Video Retargeting.
Proceedings of the 8th International Conference on Learning Representations, 2020

CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Learning to Understand People via Local, Global and Temporal Reasoning.
PhD thesis, 2019

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning.
CoRR, 2019

Are we Asking the Right Questions in MovieQA?
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

DistInit: Learning Video Representations Without a Single Labeled Video.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Video Action Transformer Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
A Better Baseline for AVA.
CoRR, 2018

Detect-and-Track: Efficient Pose Estimation in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Attentional Pooling for Action Recognition.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Binge Watching: Scaling Affordance Learning from Sitcoms.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Cutting through the clutter: Task-relevant features for image matching.
Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, 2016

Learning a Predictable and Generative Vector Representation for Objects.
Proceedings of the Computer Vision - ECCV 2016, 2016

2014
Optimizing Storage Intensive Vision Applications to Device Capacity.
Proceedings of the Computer Vision - ACCV 2014, 2014


  Loading...