2025
MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs.
CoRR, June, 2025

FT2TF: First-Person Statement Text-to-Talking Face Generation.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

2024
Is Attention All You Need For Actigraphy? Foundation Models of Wearable Accelerometer Data for Mental Health Research.
CoRR, 2024

Multi-layer Learnable Attention Mask for Multimodal Tasks.
CoRR, 2024

LangNav: Language as a Perceptual Representation for Navigation.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

2023
Learning Human Action Recognition Representations Without Real Humans.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Leveraging Temporal Context in Low Representational Power Regimes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
How Transferable are Video Representations Based on Synthetic Data?
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Cross-Modal Discrete Representation Learning.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2019
Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Automatic Adaptation of Object Detectors to New Domains Using Self-Training.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Unsupervised Hard Example Mining from Videos for Improved Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

2017
End-to-End Face Detection and Cast Grouping in Movies Using Erdös-Rényi Clustering.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2014
A randomized algorithm for natural object colorization.
Comput. Graph. Forum, 2014

Jigsaw puzzle image retrieval via pairwise compatibility measurement.
Proceedings of the International Conference on Big Data and Smart Computing, BIGCOMP 2014, 2014

2013
Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model.
KSII Trans. Internet Inf. Syst., 2013

Depth consistency evaluation for error-pose detection.
Proceedings of the Sixth International Conference on Machine Vision, 2013

Clustering space-time interest points for action representation.
Proceedings of the Sixth International Conference on Machine Vision, 2013

2012
An Intelligent Multi-Sensor Surveillance System for Elderly Care.
Smart Comput. Rev., 2012

Essential Body-Joint and Atomic Action Detection for Human Activity Recognition Using Longest Common Subsequence Algorithm.
Proceedings of the Computer Vision - ACCV 2012 Workshops, 2012