CoRR, September, 2025

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation.

[DOI]

CoRR, April, 2025

Learning from Streaming Video with Orthogonal Gradients.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

2024

It's Just Another Day: Unique Video Captioning by Discriminative Prompting.

[DOI]

CoRR, 2024

Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods.

[DOI]

João F. Henriques

Dylan Campbell

CoRR, 2024

CountGD: Multi-Modal Open-World Counting.

[DOI]

Niki Amini-Naieni

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-sentence Grounding for Long-Term Instructional Video.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning to Count Without Annotations.

[DOI]

Lukas Knobel

Yuki M. Asano

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AutoAD III: The Prequel - Back to the Pixels.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers.

[DOI]

Proceedings of the 35th British Machine Vision Conference, 2024

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description.

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

It's Just Another Day: Unique Video Captioning by Discriminitive Prompting.

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

A Strong Baseline for Temporal Video-Text Alignment.

[DOI]

CoRR, 2023

Semantic Counting from Self-Collages.

[DOI]

Lukas Knobel

Yuki M. Asano

CoRR, 2023

Open-world Text-specified Object Counting.

[DOI]

CoRR, 2023

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio.

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description.

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AutoAD: Movie Description in Context.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Open-world Text-specifed Object Counting.

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022

Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers.

[DOI]

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[DOI]

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Prompting Visual-Language Models for Efficient Video Understanding.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Temporal Alignment Networks for Long-term Video.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Turbo Training with Token Dropout.

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2020

Self-supervised Co-Training for Video Representation Learning.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Memory-Augmented Dense Predictive Coding for Video Representation Learning.

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Video Representation Learning by Dense Predictive Coding.

[DOI]