Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation.
CoRR, April, 2025
Learning from Streaming Video with Orthogonal Gradients.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
It's Just Another Day: Unique Video Captioning by Discriminative Prompting.
CoRR, 2024
Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods.
CoRR, 2024
CountGD: Multi-Modal Open-World Counting.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Multi-sentence Grounding for Long-Term Instructional Video.
Proceedings of the Computer Vision - ECCV 2024, 2024
Learning to Count Without Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
AutoAD III: The Prequel - Back to the Pixels.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Prompt Generation Networks for Input-Space Adaptation of Frozen Vision Transformers.
Proceedings of the 35th British Machine Vision Conference, 2024
AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description.
Proceedings of the Computer Vision - ACCV 2024, 2024
It's Just Another Day: Unique Video Captioning by Discriminitive Prompting.
Proceedings of the Computer Vision - ACCV 2024, 2024
A Strong Baseline for Temporal Video-Text Alignment.
CoRR, 2023
Semantic Counting from Self-Collages.
CoRR, 2023
Open-world Text-specified Object Counting.
CoRR, 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
AutoAD II: The Sequel - Who, When, and What in Movie Audio Description.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
AutoAD: Movie Description in Context.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Open-world Text-specifed Object Counting.
Proceedings of the 34th British Machine Vision Conference 2023, 2023
Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers.
CoRR, 2022
Flamingo: a Visual Language Model for Few-Shot Learning.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Prompting Visual-Language Models for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2022, 2022
Temporal Alignment Networks for Long-term Video.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Turbo Training with Token Dropout.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022
Self-supervised Co-Training for Video Representation Learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Memory-Augmented Dense Predictive Coding for Video Representation Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020
Video Representation Learning by Dense Predictive Coding.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019
Human Action Forecasting by Learning Task Grammars.
CoRR, 2017
Human Pose Forecasting via Deep Markov Models.
Proceedings of the 2017 International Conference on Digital Image Computing: Techniques and Applications, 2017