A. J. Piergiovanni

CoRR, 2023

Joint Adaptive Representations for Image-Language Learning.

[BibT_eX]

[DOI]

CoRR, 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.

[BibT_eX]

[DOI]

CoRR, 2023

Open-Vocabulary Object Detection upon Frozen Vision and Language Models.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

PaLI: A Jointly-Scaled Multilingual Language-Image Model.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compound Tokens: Channel Fusion for Vision-Language Representation Learning.

[BibT_eX]

[DOI]

Maxwell Mbabilla Aladago

Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning.

[BibT_eX]

[DOI]

Weicheng Kuo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models.

[BibT_eX]

[DOI]

CoRR, 2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model.

[BibT_eX]

[DOI]

CoRR, 2022

Pre-training image-language transformers for open-vocabulary tasks.

[BibT_eX]

[DOI]

Weicheng Kuo

CoRR, 2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering.

[BibT_eX]

[DOI]

CoRR, 2022

Video Question Answering with Iterative Video-Text Co-tokenization.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

FindIt: Generalized Localization with Natural Language Queries.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Action Segmentation for Instructional Videos.

[BibT_eX]

[DOI]

CoRR, 2021

TokenLearner: Adaptive Space-Time Tokenization for Videos.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

4D-Net for Learned Multi-Modal Alignment.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Recognizing Actions in Videos From Unseen Viewpoints.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Adaptive Intermediate Representations for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Unsupervised Discovery of Actions in Instructional Videos.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020

Correction to: Model-Based Robot Imitation with Future Image Similarity.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Model-Based Robot Imitation with Future Image Similarity.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., 2020

Learning Multimodal Representations for Unseen Activities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

AViD Dataset: Anonymized Videos from Diverse Countries.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AssembleNet++: Assembling Modality Representations via Attention Connections.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Adversarial Generative Grammars for Human Activity Prediction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Evolving Losses for Unsupervised Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Differentiable Grammars for Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Tiny Video Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Evolving Losses for Unlabeled Video Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2019

Learning Differentiable Grammars for Continuous Data.

[BibT_eX]

[DOI]

CoRR, 2019

Learning Real-World Robot Policies by Dreaming.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Temporal Gaussian Mixture Layer for Videos.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

Evolving Space-Time Neural Architectures for Videos.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Early Detection of Injuries in MLB Pitchers From Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Representation Flow for Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Model-based Behavioral Cloning with Future Image Similarity Learning.

[BibT_eX]

[DOI]

Proceedings of the 3rd Annual Conference on Robot Learning, 2019

2018

Learning Shared Multimodal Embeddings with Unpaired Data.

[BibT_eX]

[DOI]

CoRR, 2018

Activity Detection with Latent Sub-event Hierarchy Learning.

[BibT_eX]

[DOI]

CoRR, 2018

Action-Conditioned Convolutional Future Regression Models for Robot Imitation Learning.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Fine-Grained Activity Recognition in Baseball Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Learning Latent Super-Events to Detect Multiple Activities in Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Title Learning Latent Subevents in Activity Videos Using Temporal Attention Filters.

[BibT_eX]

[DOI]

Chenyou Fan

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

Temporal attention filters for human activity recognition in videos.

[BibT_eX]

[DOI]

Chenyou Fan

CoRR, 2016

2015

Computational principles underlying people's behavior explanations.

[BibT_eX]

[DOI]