Tengda Han

CoRR, 2024

CountGD: Multi-Modal Open-World Counting.

[BibT_eX]

[DOI]

Niki Amini-Naieni

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Multi-sentence Grounding for Long-Term Instructional Video.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Learning to Count Without Annotations.

[BibT_eX]

[DOI]

Lukas Knobel

Yuki M. Asano

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AutoAD III: The Prequel - Back to the Pixels.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

It's Just Another Day: Unique Video Captioning by Discriminitive Prompting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

A Strong Baseline for Temporal Video-Text Alignment.

[BibT_eX]

[DOI]

CoRR, 2023

Semantic Counting from Self-Collages.

[BibT_eX]

[DOI]

Lukas Knobel

Yuki M. Asano

CoRR, 2023

Open-world Text-specified Object Counting.

[BibT_eX]

[DOI]

CoRR, 2023

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AutoAD II: The Sequel - Who, When, and What in Movie Audio Description.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

AutoAD: Movie Description in Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Open-world Text-specifed Object Counting.

[BibT_eX]

[DOI]

Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022

Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Flamingo: a Visual Language Model for Few-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Prompting Visual-Language Models for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Temporal Alignment Networks for Long-term Video.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Turbo Training with Token Dropout.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2020

Self-supervised Co-Training for Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Memory-Augmented Dense Predictive Coding for Video Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

2019

Video Representation Learning by Dense Predictive Coding.

[BibT_eX]

[DOI]