2025
EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs.
CoRR, June, 2025

Keystep Recognition using Graph Neural Networks.
CoRR, June, 2025

Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection.
CoRR, May, 2025

DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting.
CoRR, March, 2025

Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition.
CoRR, January, 2025

Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

2024
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video.
CoRR, 2024

Contrastive Language Video Time Pre-training.
CoRR, 2024

R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Action Scene Graphs for Long-Form Understanding of Egocentric Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization.
CoRR, 2023

Unbiased Scene Graph Generation in Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SViTT: Temporal Learning of Sparse Video-Text Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization.
CoRR, 2022

Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
Learning Spatial-Temporal Graphs for Active Speaker Detection.
CoRR, 2021

Integrating Human Gaze into Attention for Egocentric Activity Recognition.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

2020
Adversarial Background-Aware Loss for Weakly-Supervised Temporal Activity Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Hierarchical Novelty Detection for Visual Object Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018