A. J. Piergiovanni

According to our database1, A. J. Piergiovanni authored at least 52 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning.
CoRR, 2024

SLVP: Self-Supervised Language-Video Pre-Training for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2024

Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024


2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks.
Trans. Mach. Learn. Res., 2023

Diversifying Joint Vision-Language Tokenization Learning.
CoRR, 2023

Joint Adaptive Representations for Image-Language Learning.
CoRR, 2023

PaLI-X: On Scaling up a Multilingual Vision and Language Model.
CoRR, 2023

Open-Vocabulary Object Detection upon Frozen Vision and Language Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

PaLI: A Jointly-Scaled Multilingual Language-Image Model.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Compound Tokens: Channel Fusion for Vision-Language Representation Learning.
Proceedings of the First Tiny Papers Track at ICLR 2023, 2023

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models.
CoRR, 2022

PaLI: A Jointly-Scaled Multilingual Language-Image Model.
CoRR, 2022

Pre-training image-language transformers for open-vocabulary tasks.
CoRR, 2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering.
CoRR, 2022

Video Question Answering with Iterative Video-Text Co-tokenization.
Proceedings of the Computer Vision - ECCV 2022, 2022

FindIt: Generalized Localization with Natural Language Queries.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
CoRR, 2021

Unsupervised Action Segmentation for Instructional Videos.
CoRR, 2021

TokenLearner: Adaptive Space-Time Tokenization for Videos.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

4D-Net for Learned Multi-Modal Alignment.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Recognizing Actions in Videos From Unseen Viewpoints.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Adaptive Intermediate Representations for Video Understanding.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Unsupervised Discovery of Actions in Instructional Videos.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Correction to: Model-Based Robot Imitation with Future Image Similarity.
Int. J. Comput. Vis., 2020

Model-Based Robot Imitation with Future Image Similarity.
Int. J. Comput. Vis., 2020

Learning Multimodal Representations for Unseen Activities.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2020

AViD Dataset: Anonymized Videos from Diverse Countries.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures.
Proceedings of the 8th International Conference on Learning Representations, 2020

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification.
Proceedings of the Computer Vision - ECCV 2020, 2020

AssembleNet++: Assembling Modality Representations via Attention Connections.
Proceedings of the Computer Vision - ECCV 2020, 2020

Adversarial Generative Grammars for Human Activity Prediction.
Proceedings of the Computer Vision - ECCV 2020, 2020

Evolving Losses for Unsupervised Video Representation Learning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Differentiable Grammars for Videos.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Tiny Video Networks.
CoRR, 2019

Evolving Losses for Unlabeled Video Representation Learning.
CoRR, 2019

Learning Differentiable Grammars for Continuous Data.
CoRR, 2019

Learning Real-World Robot Policies by Dreaming.
Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2019

Temporal Gaussian Mixture Layer for Videos.
Proceedings of the 36th International Conference on Machine Learning, 2019

Evolving Space-Time Neural Architectures for Videos.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Early Detection of Injuries in MLB Pitchers From Video.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Representation Flow for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Model-based Behavioral Cloning with Future Image Similarity Learning.
Proceedings of the 3rd Annual Conference on Robot Learning, 2019

2018
Learning Shared Multimodal Embeddings with Unpaired Data.
CoRR, 2018

Activity Detection with Latent Sub-event Hierarchy Learning.
CoRR, 2018

Action-Conditioned Convolutional Future Regression Models for Robot Imitation Learning.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Fine-Grained Activity Recognition in Baseball Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

Learning Latent Super-Events to Detect Multiple Activities in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Title Learning Latent Subevents in Activity Videos Using Temporal Attention Filters.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Temporal attention filters for human activity recognition in videos.
CoRR, 2016

2015
Computational principles underlying people's behavior explanations.
Proceedings of the 37th Annual Meeting of the Cognitive Science Society, 2015


  Loading...