Makarand Tapaswi

Orcid: 0000-0001-8800-9015

According to our database1, Makarand Tapaswi authored at least 63 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation.
CoRR, 2024

No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning.
CoRR, 2024

Major Entity Identification: A Generalizable Alternative to Coreference Resolution.
CoRR, 2024

VELOCITI: Can Video-Language Models Bind Semantic Concepts through Time?
CoRR, 2024

FiGCLIP: Fine-Grained CLIP Adaptation via Densely Annotated Videos.
CoRR, 2024

Major Entity Identification: A Generalizable Alternative to Coreference Resolution.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

"Previously on..." from Recaps to Story Summarization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MICap: A Unified Model for Identity-Aware Movie Descriptions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NurtureNet: A Multi-task Video-based Approach for Newborn Anthropometry.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Eye vs. AI: Human Gaze and Model Attention in Video Memorability.
CoRR, 2023

Generalized Cross-domain Multi-label Few-shot Learning for Chest X-rays.
CoRR, 2023

GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering.
Proceedings of the Companion Proceedings of the ACM Web Conference 2023, 2023

Unsupervised Audio-Visual Lecture Segmentation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

How You Feelin'? Learning Emotions and Mental States in Movie Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Test of Time: Instilling Video-Language Models with a Sense of Time.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Can we Adopt Self-supervised Pretraining for Chest X-Rays?
CoRR, 2022

Grounded Video Situation Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Learning Object Manipulation Skills from Video via Approximate Differentiable Physics.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Instruction-driven history-aware policies for robotic manipulations.
Proceedings of the Conference on Robot Learning, 2022

2021
Long term spatio-temporal modeling for action detection.
Comput. Vis. Image Underst., 2021

Feature generation for long-tail classification.
Proceedings of the ICVGIP '21: Indian Conference on Computer Vision, Graphics and Image Processing, Jodhpur, India, December 19, 2021

Airbert: In-domain Pretraining for Vision-and-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Video Face Clustering With Self-Supervised Representation Learning.
IEEE Trans. Biom. Behav. Identity Sci., 2020

Deep Multimodal Feature Encoding for Video Ordering.
CoRR, 2020

Clustering based Contrastive Learning for Improving Face Representations.
Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition, 2020

Learning Interactions and Relationships Between Movie Characters.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Object Manipulation Skills via Approximate State Estimation from Real Videos.
Proceedings of the 4th Conference on Robot Learning, 2020

2019
The Shmoop Corpus: A Dataset of Stories with Loosely Aligned Summaries.
CoRR, 2019

Visual Reasoning by Progressive Module Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Video Face Clustering With Unknown Number of Clusters.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Self-Supervised Learning of Face Representations for Video Face Clustering.
Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition, 2019

2018
Progressive Reasoning by Module Composition.
CoRR, 2018

Now You Shake Me: Towards Automatic 4D Cinema.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MovieGraphs: Towards Understanding Human-Centric Situations From Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Situation Recognition with Graph Neural Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Story Understanding through Semantic Analysis and Automatic Alignment of Text and Video
PhD thesis, 2016

Relaxed Earth Mover's Distances for Chain- and Tree-connected Spaces and their use as a Loss Function in Deep Learning.
CoRR, 2016

Naming TV characters by watching and analyzing dialogs.
Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, 2016

MovieQA: Understanding Stories in Movies through Question-Answering.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

Recovering the Missing Link: Predicting Class-Attribute Associations for Unsupervised Zero-Shot Learning.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Aligning plot synopses to videos for story-based retrieval.
Int. J. Multim. Inf. Retr., 2015

Accio: A Data Set for Face Track Retrieval in Movies Across Age.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

KIT at MediaEval 2015 - Evaluating Visual Cues for Affective Impact of Movies Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

Improved weak labels using contextual cues for person identification in videos.
Proceedings of the 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, 2015

Book2Movie: Aligning video scenes with book chapters.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Story-based Video Retrieval in TV series using Plot Synopses.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Total Cluster: A person agnostic clustering method for broadcast videos.
Proceedings of the 2014 Indian Conference on Computer Vision, 2014

Cleaning up after a face tracker: False positive removal.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

StoryGraphs: Visualizing Character Interactions as a Timeline.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

A time pooled track kernel for person identification.
Proceedings of the 11th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2014

2013

Semi-supervised Learning with Constraints for Person Identification in Multimedia Data.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

2012
KIT at MediaEval 2012 - Content - based Genre Classification with Visual Cues.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Fusion of Speech, Faces and Text for Person Identification in TV Broadcast.
Proceedings of the Computer Vision - ECCV 2012. Workshops and Demonstrations, 2012

"Knock! Knock! Who is it?" probabilistic person identification in TV-series.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012

Contextual Constraints for Person Retrieval in Camera Networks.
Proceedings of the Ninth IEEE International Conference on Advanced Video and Signal-Based Surveillance, 2012

2008
Multilingual spoken-password based user authentication in emerging economies using cellular phone networks.
Proceedings of the 2008 IEEE Spoken Language Technology Workshop, 2008

Audio-Visual Person Authentication with Multiple Visualized-Speech Features and Multiple Face Profiles.
Proceedings of the Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008


  Loading...