EASG-Bench: Video Q&A Benchmark with Egocentric Action Scene Graphs.
CoRR, June, 2025
Keystep Recognition using Graph Neural Networks.
CoRR, June, 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection.
CoRR, May, 2025
DecompDreamer: Advancing Structured 3D Asset Generation with Multi-Object Decomposition and Gaussian Splatting.
CoRR, March, 2025
Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition.
CoRR, January, 2025
Ego-VPA: Egocentric Video Understanding with Parameter-Efficient Adaptation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video.
CoRR, 2024
Contrastive Language Video Time Pre-training.
CoRR, 2024
R.A.C.E. : Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024
Action Scene Graphs for Long-Form Understanding of Egocentric Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
STHG: Spatial-Temporal Heterogeneous Graph Learning for Advanced Audio-Visual Diarization.
CoRR, 2023
Unbiased Scene Graph Generation in Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
SViTT: Temporal Learning of Sparse Video-Text Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Intel Labs at Ego4D Challenge 2022: A Better Baseline for Audio-Visual Diarization.
CoRR, 2022
Learning Long-Term Spatial-Temporal Graphs for Active Speaker Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022
Learning Spatial-Temporal Graphs for Active Speaker Detection.
CoRR, 2021
Integrating Human Gaze into Attention for Egocentric Activity Recognition.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021
Adversarial Background-Aware Loss for Weakly-Supervised Temporal Activity Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020
TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Hierarchical Novelty Detection for Visual Object Recognition.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018