Gedas Bertasius

According to our database1, Gedas Bertasius authored at least 54 papers between 2015 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos.
CoRR, 2024

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos.
CoRR, 2024

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos.
CoRR, 2024

Siamese Vision Transformers are Scalable Audio-visual Learners.
CoRR, 2024

Augmented Reality Demonstrations for Scalable Robot Imitation Learning.
CoRR, 2024

DAM: Dynamic Adapter Merging for Continual Video QA Learning.
CoRR, 2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences.
CoRR, 2024

A Simple LLM Framework for Long-Range Video Question-Answering.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

4DIFF: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation.
Proceedings of the Computer Vision - ECCV 2024, 2024

LoCoNet: Long-Short Context Network for Active Speaker Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Video ReCap: Recursive Captioning of Hour-Long Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Building Secure and Engaging Video Communication by Using Monitor Illumination.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
MuMUR: Multilingual Multimodal Universal Retrieval.
Inf. Retr. J., June, 2023

RGNet: A Unified Retrieval and Grounding Network for Long Videos.
CoRR, 2023

LoCoNet: Long-Short Context Network for Active Speaker Detection.
CoRR, 2023

Unified Coarse-to-Fine Alignment for Video-Text Retrieval.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SimpleClick: Interactive Image Segmentation with Simple Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Improving Video Retrieval Using Multilingual Knowledge Transfer.
Proceedings of the Advances in Information Retrieval, 2023

Vision Transformers are Parameter-Efficient Audio-Visual Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Efficient Movie Scene Detection using State-Space Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VindLU: A Recipe for Effective Video-and-Language Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism.
CoRR, 2022

Learning to Retrieve Videos by Asking Questions.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

EclipSE: Efficient Long-Range Video Retrieval Using Sight and Sound.
Proceedings of the Computer Vision - ECCV 2022, 2022

Long Movie Clip Classification with State-Space Video Models.
Proceedings of the Computer Vision - ECCV 2022, 2022

TallFormer: Temporal Action Localization with a Long-Memory Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

Long-Short Temporal Contrastive Learning of Video Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Learning To Recognize Procedural Activities with Distant Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Supervoxel Attention Graphs for Long-Range Video Modeling.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Is Space-Time Attention All You Need for Video Understanding?
Proceedings of the 38th International Conference on Machine Learning, 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Classifying, Segmenting, and Tracking Object Instances in Video with Mask Propagation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attentive Action and Context Factorization.
Proceedings of the 31st British Machine Vision Conference 2020, 2020

2019
Learning Temporal Pose Estimation from Sparsely-Labeled Videos.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Learning Discriminative Motion Features Through Detection.
CoRR, 2018

Object Detection in Video with Spatiotemporal Sampling Networks.
Proceedings of the Computer Vision - ECCV 2018, 2018

Egocentric Basketball Motion Planning From a Single First-Person Image.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
First-Person Action-Object Detection with EgoNet.
Proceedings of the Robotics: Science and Systems XIII, 2017

Using Cross-Model EgoSupervision to Learn Cooperative Basketball Intention.
Proceedings of the 2017 IEEE International Conference on Computer Vision Workshops, 2017

Am I a Baller? Basketball Performance Assessment from First-Person Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Unsupervised Learning of Important Objects from First-Person Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Convolutional Random Walk Networks for Semantic Image Segmentation.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Local Perturb-and-MAP for Structured Prediction.
Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 2017

2016
Exploiting Visual-Spatial First-Person Co-Occurrence for Action-Object Detection without Labels.
CoRR, 2016

Am I a Baller? Basketball Skill Assessment using First-Person Cameras.
CoRR, 2016

Automatic Lymph Node Cluster Segmentation Using Holistically-Nested Neural Networks and Structured Optimization in CT Images.
Proceedings of the Medical Image Computing and Computer-Assisted Intervention - MICCAI 2016, 2016

Semantic Segmentation with Boundary Neural Fields.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Exploiting Egocentric Object Prior for 3D Saliency Detection.
CoRR, 2015

High-for-Low and Low-for-High: Efficient Boundary Detection from Deep Object Features and Its Applications to High-Level Vision.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

DeepEdge: A multi-scale bifurcated deep network for top-down contour detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015


  Loading...