Michael S. Ryoo

Orcid: 0000-0002-5452-8332

Affiliations:
  • Stony Brook University, Department of Computer Science, NY, USA
  • Google Brain
  • Indiana University Bloomington, IN, USA
  • NASA Jet Propulsion Laboratory (JPL), Pasadena, CA, USA
  • University of Texas at Austin, Computer and Vision Research Center, TX, USA (PhD 2008)
  • Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea


According to our database1, Michael S. Ryoo authored at least 149 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs.
CoRR, 2024

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations.
CoRR, 2024

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
CoRR, 2024

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy.
CoRR, 2024

Too Many Frames, not all Useful: Efficient Strategies for Long-Form Video QA.
CoRR, 2024

Understanding Long Videos in One Multimodal Language Model Pass.
CoRR, 2024

Language Repository for Long Video Understanding.
CoRR, 2024

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Grafting Vision Transformers.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Diffusion Illusions: Hiding Images in Plain Sight.
Proceedings of the ACM SIGGRAPH 2024 Conference Papers, 2024

Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

CoPT: Unsupervised Domain Adaptive Segmentation Using Domain-Agnostic Text Embeddings.
Proceedings of the Computer Vision - ECCV 2024, 2024

Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VicTR: Video-conditioned Text Representations for Activity Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MAGICK: A Large-Scale Captioned Dataset from Matting Generated Images Using Chroma Keying.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
StARformer: Transformer With State-Action-Reward Representations for Robot Learning.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2023

AAN: Attributes-Aware Network for Temporal Action Detection.
CoRR, 2023

Active Reinforcement Learning under Limited Visual Observability.
CoRR, 2023

ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023


Active Vision Reinforcement Learning under Limited Visual Observability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Language-based Action Concept Spaces Improve Video Self-Supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning.
Proceedings of the 18th International Conference on Machine Vision and Applications, 2023

SWAT: Spatial Structure Within and Among Tokens.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Energy-Based Models for Cross-Modal Localization using Convolutional Transformers.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Open-vocabulary Queryable Scene Representations for Real World Planning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reducing Inference Latency with Concurrent Architectures for Image Recognition at Edge.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2023

Token Turing Machines.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023


Attributes-Aware Network for Temporal Action Detection.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Weakly-Guided Self-Supervised Pretraining for Temporal Activity Detection.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors.
CoRR, 2022

Video + CLIP Baseline for Ego4D Long-term Action Anticipation.
CoRR, 2022

Neural Neural Textures Make Sim2Real Consistent.
CoRR, 2022

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language.
CoRR, 2022

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Hybrid Random Features.
Proceedings of the Tenth International Conference on Learning Representations, 2022

StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Video Question Answering with Iterative Video-Text Co-tokenization.
Proceedings of the Computer Vision - ECCV 2022, 2022

Self-supervised Video Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

TRITON: Neural Neural Textures for Better Sim2Real.
Proceedings of the Conference on Robot Learning, 2022

2021
STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation.
CoRR, 2021

Self-supervised Pretraining with Classification Labels for Temporal Activity Detection.
CoRR, 2021

StARformer: Transformer with State-Action-Reward Representations.
CoRR, 2021

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
CoRR, 2021

Unsupervised Action Segmentation for Instructional Videos.
CoRR, 2021

TokenLearner: Adaptive Space-Time Tokenization for Videos.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Disentangled Representation Learning for Third-Person Imitation Learning.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Visionary: Vision architecture discovery for robot learning.
Proceedings of the IEEE International Conference on Robotics and Automation, 2021

4D-Net for Learned Multi-Modal Alignment.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Recognizing Actions in Videos From Unseen Viewpoints.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Adaptive Intermediate Representations for Video Understanding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Coarse-Fine Networks for Temporal Activity Detection in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Unsupervised Discovery of Actions in Instructional Videos.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices.
IEEE Internet Things J., 2020

Correction to: Model-Based Robot Imitation with Future Image Similarity.
Int. J. Comput. Vis., 2020

Model-Based Robot Imitation with Future Image Similarity.
Int. J. Comput. Vis., 2020

Reducing Inference Latency with Concurrent Architectures for Image Recognition.
CoRR, 2020

Edge-Tailored Perception: Fast Inferencing in-the-Edge with Efficient Model Distribution.
CoRR, 2020

Learning Multimodal Representations for Unseen Activities.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

AViD Dataset: Anonymized Videos from Diverse Countries.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures.
Proceedings of the 8th International Conference on Learning Representations, 2020

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification.
Proceedings of the Computer Vision - ECCV 2020, 2020

AssembleNet++: Assembling Modality Representations via Attention Connections.
Proceedings of the Computer Vision - ECCV 2020, 2020

Adversarial Generative Grammars for Human Activity Prediction.
Proceedings of the Computer Vision - ECCV 2020, 2020

Password-Conditioned Anonymization and Deanonymization with Face Identity Transformers.
Proceedings of the Computer Vision - ECCV 2020, 2020

Evolving Losses for Unsupervised Video Representation Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Differentiable Grammars for Videos.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Tiny Video Networks.
CoRR, 2019

Evolving Losses for Unlabeled Video Representation Learning.
CoRR, 2019

Learning Differentiable Grammars for Continuous Data.
CoRR, 2019

Collaborative Execution of Deep Neural Networks on Internet of Things Devices.
CoRR, 2019

Learning Real-World Robot Policies by Dreaming.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Privacy-Preserving Robot Vision with Anonymized Faces by Extreme Low Resolution.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Temporal Gaussian Mixture Layer for Videos.
Proceedings of the 36th International Conference on Machine Learning, 2019

Evolving Space-Time Neural Architectures for Videos.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Robustly Executing DNNs in IoT Systems Using Coded Distributed Computing.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Early Detection of Injuries in MLB Pitchers From Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Representation Flow for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Model-based Behavioral Cloning with Future Image Similarity Learning.
Proceedings of the 3rd Annual Conference on Robot Learning, 2019

2018
Distributed Perception by Collaborative Robots.
IEEE Robotics Autom. Lett., 2018

Learning Shared Multimodal Embeddings with Unpaired Data.
CoRR, 2018

Activity Detection with Latent Sub-event Hierarchy Learning.
CoRR, 2018

Musical Chair: Efficient Real-Time Recognition Using Collaborative IoT Devices.
CoRR, 2018

Joint Person Segmentation and Identification in Synchronized First- and Third-Person Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

Learning to Anonymize Faces for Privacy Preserving Action Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Forecasting Hands and Objects in Future Frames.
Proceedings of the Computer Vision - ECCV 2018 Workshops, 2018

Action-Conditioned Convolutional Future Regression Models for Robot Imitation Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Fine-Grained Activity Recognition in Baseball Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Learning Latent Super-Events to Detect Multiple Activities in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Real-Time Image Recognition Using Collaborative IoT Devices.
Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning, 2018

Extreme Low Resolution Activity Recognition With Multi-Siamese Embedding Learning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Forecasting Hand and Object Locations in Future Frames.
CoRR, 2017

Learning robot activities from first-person human videos using convolutional future regression.
Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017

Multi-Type Activity Recognition from a Robot's Viewpoint.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Learning social affordance grammar from videos: Transferring human interactions to human-robot interactions.
Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

A holistic approach to interpreting human states in smart environments providing high quality of life.
Proceedings of the Seventh International Conference on Emerging Security Technologies, 2017

Identifying First-Person Camera Wearers in Third-Person Videos.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Privacy-Preserving Human Activity Recognition from Extreme Low Resolution.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

Title Learning Latent Subevents in Activity Videos Using Temporal Attention Filters.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Multitype Activity Recognition in Robot-Centric Scenarios.
IEEE Robotics Autom. Lett., 2016

First-Person Activity Recognition: Feature, Temporal Structure, and Prediction.
Int. J. Comput. Vis., 2016

Privacy-Preserving Egocentric Activity Recognition from Extreme Low Resolution.
CoRR, 2016

Temporal attention filters for human activity recognition in videos.
CoRR, 2016

Learning Social Affordance for Human-Robot Interaction.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Building Unified Human Descriptors For Multi-Type Activity Recognition.
CoRR, 2015

Robot-centric Activity Recognition from First-Person RGB-D Videos.
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

Robot-Centric Activity Prediction from First-Person Videos: What Will They Do to Me'.
Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction, 2015

Pooled motion features for first-person videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Early Recognition of Human Activities from First-Person Videos Using Onset Representations.
CoRR, 2014

First-Person Animal Activity Recognition from Egocentric Videos.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

An Introduction to the 3rd Workshop on Egocentric (First-Person) Vision.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Personal driving diary: Automated recognition of driving events from first-person videos.
Comput. Vis. Image Underst., 2013

First-Person Activity Recognition: What Are They Doing to Me?
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Recognizing Humans in Motion: Trajectory-based Aerial Video Analysis.
Proceedings of the British Machine Vision Conference, 2013

2012
Toward a unified framework of motion understanding.
Image Vis. Comput., 2012

Reliable object detection and segmentation using inpainting.
Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012

2011
Stochastic Representation and Recognition of High-Level Group Activities.
Int. J. Comput. Vis., 2011

Human activity analysis: A review.
ACM Comput. Surv., 2011

One video is sufficient? Human activity recognition using active video composition.
Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV 2011), 2011

Personal driving diary: Constructing a video archive of everyday driving events.
Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV 2011), 2011

Compensating for visually missing features: Scale adaptive recognition of objects using probabilistic voting.
Proceedings of the 8th International Conference on Ubiquitous Robots and Ambient Intelligence, 2011

Background-aware pedestrian/vehicle detection system for driving environments.
Proceedings of the 14th International IEEE Conference on Intelligent Transportation Systems, 2011

Interactive learning of human activities using active video composition.
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011

Human activity prediction: Early recognition of ongoing activities from streaming videos.
Proceedings of the IEEE International Conference on Computer Vision, 2011

2010
A task-driven intelligent workspace system to provide guidance feedback.
Comput. Vis. Image Underst., 2010

An Overview of Contest on Semantic Description of Human Activities (SDHA) 2010.
Proceedings of the Recognizing Patterns in Signals, Speech, Images and Videos, 2010

Video scene analysis of interactions between humans and vehicles using event context.
Proceedings of the 9th ACM International Conference on Image and Video Retrieval, 2010

2009
Real-Time Illegal Parking Detection in Outdoor Environments Using 1-D Transformation.
IEEE Trans. Circuits Syst. Video Technol., 2009

Detection of object abandonment using temporal logic.
Mach. Vis. Appl., 2009

Semantic Representation and Recognition of Continued and Recursive Human Activities.
Int. J. Comput. Vis., 2009

Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities.
Proceedings of the IEEE 12th International Conference on Computer Vision, ICCV 2009, Kyoto, Japan, September 27, 2009

Stochastic representation and recognition of high-level group activities: Describing structural uncertainties in human activities.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009

2008
Human activities: Handling uncertainties using fuzzy time intervals.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008

2007
Robust Human-Computer Interaction System Guiding a User by Providing Feedback.
Proceedings of the IJCAI 2007, 2007

Hierarchical Recognition of Human Activities Interacting with Objects.
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2007), 2007

Real-time detection of illegally parked vehicles using 1-D transformation.
Proceedings of the Fourth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2007

Detection of abandoned objects in crowded environments.
Proceedings of the Fourth IEEE International Conference on Advanced Video and Signal Based Surveillance, 2007

2006
Semantic Understanding of Continued and Recursive Human Activities.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Recognition of Composite Human Activities through Context-Free Grammar Based Representation.
Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), 2006

2005
Evolving neural network ensembles for control problems.
Proceedings of the Genetic and Evolutionary Computation Conference, 2005

Affective Dialogue Communication System with Emotional Memories for Humanoid Robots.
Proceedings of the Affective Computing and Intelligent Interaction, 2005

2004
Affective communication system with multimodality for a humanoid robot, AMI.
Proceedings of the 4th IEEE/RAS International Conference on Humanoid Robots, 2004


  Loading...