2025

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills.

[DOI]

,

Michael Y. Fatemi

,

,

Zora Zhiruo Wang

,

,

,

,

Jayanth Srinivasa

,

,

,

CoRR, April, 2025

Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model.

[DOI]

,

,

Charles Fleming

,

CoRR, March, 2025

Compositional Caching for Training-free Open-vocabulary Attribute Detection.

[DOI]

,

Alessandro Conti

,

,

,

Massimiliano Mancini

CoRR, March, 2025

ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction.

[DOI]

,

,

,

Charles Fleming

,

Ramana Rao Kompella

,

,

CoRR, March, 2025

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning.

[DOI]

,

,

,

,

,

CoRR, March, 2025

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval.

[DOI]

,

Jayanth Srinivasa

,

,

CoRR, March, 2025

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture.

[DOI]

,

,

,

Ramana Rao Kompella

,

,

CoRR, February, 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1.

[DOI]

,

,

,

Shreedhar Jangam

,

Jayanth Srinivasa

,

,

,

CoRR, February, 2025

A First-order Generative Bilevel Optimization Framework for Diffusion Models.

[DOI]

,

,

,

,

Ramana Kompella

,

,

CoRR, February, 2025

Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge.

[DOI]

Nicholas John Eliopoulos

,

,

,

,

George K. Thiravathukal

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images.

[DOI]

,

,

,

,

Chris Xiaoxuan Lu

Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning.

[DOI]

Venkatesh Mishra

,

Bimsara Pathiraja

,

,

,

Jayanth Srinivasa

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Quantized-ViT Efficient Training via Fisher Matrix Regularization.

[DOI]

,

,

Ramana Kompella

,

Proceedings of the MultiMedia Modeling, 2025

UniMuMo: Unified Text, Music, and Motion Generation.

[DOI]

,

,

,

,

,

,

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification.

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

Diverse Score Distillation.

[DOI]

,

Jayanth Srinivasa

,

,

Shubham Tulsiani

CoRR, 2024

Safeguarding Text-to-Image Generation via Inference-Time Prompt-Noise Optimization.

[DOI]

Jiangweizhi Peng

,

,

,

Charles Fleming

,

CoRR, 2024

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain.

[DOI]

,

,

,

Cees G. M. Snoek

CoRR, 2024

UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS.

[DOI]

,

,

,

,

,

,

Xiangliang Zhang

CoRR, 2024

Prompt Diffusion Robustifies Any-Modality Prompt Learning.

[DOI]

,

,

,

,

Ramana Kompella

,

Cees G. M. Snoek

CoRR, 2024

Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry.

[DOI]

,

,

,

,

CoRR, 2024

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization.

[DOI]

,

,

,

,

CoRR, 2024

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.

[DOI]

,

,

,

,

,

,

Jenq-Neng Hwang

,

,

Wen-Huang Cheng

CoRR, 2024

A Survey on Large Language Model-Based Game Agents.

[DOI]

,

Tiansheng Huang

,

,

,

,

Ramana Kompella

,

CoRR, 2024

Training-Free Semantic Segmentation via LLM-Supervision.

[DOI]

,

,

,

Ramana Kompella

,

Cees G. M. Snoek

CoRR, 2024

Urban Scene Diffusion through Semantic Occupancy Map.

[DOI]

,

,

,

Ramana Rao Kompella

,

,

CoRR, 2024

Adaptive Deep Neural Network Inference Optimization with EENet.

[DOI]

,

,

,

Tiansheng Huang

,

,

,

,

,

Ramana Kompella

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models.

[DOI]

,

,

,

,

,

,

,

,

Ramana Kompella

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models.

[DOI]

,

,

,

,

,

Ramana Kompella

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference.

[DOI]

,

,

,

,

Ramana Kompella

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Advancing the Robustness of Large Language Models through Self-Denoised Smoothing.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check.

[DOI]

,

,

Hsiang-Wei Huang

,

,

,

,

,

,

Jenq-Neng Hwang

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

CenterRadarNet: Joint 3D Object Detection and Tracking Framework Using 4D FMCW Radar.

[DOI]

,

,

,

,

,

Jenq-Neng Hwang

Proceedings of the IEEE International Conference on Image Processing, 2024

A Method for Bilevel Optimization with Convex Lower-Level Problem.

[DOI]

,

Santiago Paternain

,

,

Ramana Kompella

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Variance Reduction Can Improve Trade-Off in Multi-Objective Learning.

[DOI]

Heshan Devaka Fernando

,

,

,

,

,

Subhajit Chaudhury

,

Keerthiram Murugesan

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Open-world Multi-label Text Classification with Extremely Weak Supervision.

[DOI]

,

,

,

Jayanth Srinivasa

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding.

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities.

[DOI]

,

,

,

Charles Fleming

,

Chris Xiaoxuan Lu

Proceedings of the Computer Vision - ECCV 2024, 2024

Enhancing Post-Training Quantization Calibration Through Contrastive Learning.

[DOI]

,

,

Ramana Rao Kompella

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Multitask Dense Predictor via Binarization.

[DOI]

,

,

,

Ramana Rao Kompella

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning.

[DOI]

,

Massimiliano Mancini

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Riemannian Multinomial Logistics Regression for SPD Neural Networks.

[DOI]

,

,

,

Ramana Rao Kompella

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question.

[DOI]

,

,

,

Jayanth Srinivasa

,

,

,

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting.

[DOI]

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models.

[DOI]

,

,

,

,

,

Ramana Rao Kompella

,

CoRR, 2023

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar.

[DOI]

,

,

,

,

Jenq-Neng Hwang

CoRR, 2023

Fast and Resource-Efficient Object Tracking on Edge Devices: A Measurement Study.

[DOI]

Sanjana Vijay Ganesh

,

,

,

Ramana Kompella

,

CoRR, 2023

A<sup>2</sup>Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

Riemannian Multiclass Logistics Regression for SPD Neural Networks.

[DOI]

,

,

,

Ramana Rao Kompella

,

,

CoRR, 2023

Model Sparsification Can Simplify Machine Unlearning.

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

Optical Flow Estimation in 360° Videos: Dataset, Model and Application.

[DOI]

,

Keshav Bhandari

,

,

CoRR, 2023

EENet: Learning to Early Exit for Adaptive Inference.

[DOI]

,

,

,

,

,

,

Ramana Kompella

,

,

CoRR, 2023

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling.

[DOI]

,

,

,

,

,

Jayanth Srinivasa

,

Ramana Kompella

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Model Sparsity Can Simplify Machine Unlearning.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Spatially-Aware Human-Object Interaction Detection with Cross-Modal Enhancement.

[DOI]

,

,

,

,

,

Proceedings of the Neural Information Processing - 30th International Conference, 2023

Causal-DFQ: Causality Guided Data-free Network Quantization.

[DOI]

,

,

,

Ramana Rao Kompella

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Network Specialization via Feature-level Knowledge Distillation.

[DOI]

,

,

,

Ramana Kompella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Amplifying Object Tracking Performance on Edge Devices.

[DOI]

Sanjana Vijay Ganesh

,

,

,

Ramana Kompella

,

Proceedings of the 5th IEEE International Conference on Cognitive Machine Intelligence, 2023

2022

Learning Omnidirectional Flow in 360-degree Video via Siamese Representation.

[DOI]

Keshav Bhandari

,

,

,

,

,

CoRR, 2022

Learning Omnidirectional Flow in 360$^\circ $ Video via Siamese Representation.

[DOI]

Keshav Bhandari

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2022, 2022

Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Parallel Generative Adversarial Network for Third-person to First-person Image Generation.

[DOI]

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021

A Metamodel and Framework for Artificial General Intelligence From Theory to Practice.

[DOI]

,

,

,

Ramana Kompella

,

,

,

Jayanth Srinivasa

,

,

,

Kristinn R. Thórisson

J. Artif. Intell. Conscious., 2021

Cross-View Exocentric to Egocentric Video Synthesis.

[DOI]

,

,

,

,

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020

Exocentric to Egocentric Image Generation Via Parallel Generative Adversarial Network.

[DOI]

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation.

[DOI]

,

,

,

,

,

Proceedings of the 27th ACM International Conference on Multimedia, 2019

2017

Learning with Shared Information for Image and Video Analysis.

[DOI]

PhD thesis, 2017

Graph-based clustering and ranking for diversified image search.

[DOI]

,

,

,

,

Multim. Syst., 2017

2016

Active domain adaptation with noisy labels for multimedia analysis.

[DOI]

,

,

Ramanathan Subramanian

,

,

,

World Wide Web, 2016

A Multi-Task Learning Framework for Head Pose Estimation under Target Motion.

[DOI]

,

,

Ramanathan Subramanian

,

,

,

IEEE Trans. Pattern Anal. Mach. Intell., 2016

2015

Event Oriented Dictionary Learning for Complex Event Detection.

[DOI]

,

,

,

,

,

Alexander G. Hauptmann

,

IEEE Trans. Image Process., 2015

Egocentric Daily Activity Recognition via Multitask Clustering.

[DOI]

,

,

,

IEEE Trans. Image Process., 2015

Inferring Painting Style with Multi-Task Dictionary Learning.

[DOI]

,

,

,

,

,

,

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Complex Event Detection via Event Oriented Dictionary Learning.

[DOI]

,

,

,

,

,

Alexander G. Hauptmann

,

Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014

Multitask Linear Discriminant Analysis for View Invariant Action Recognition.

[DOI]

,

,

Ramanathan Subramanian

,

,

IEEE Trans. Image Process., 2014

GLocal tells you more: Coupling GLocal structural for feature selection with sparsity for image and video classification.

[DOI]

,

,

,

,

,

Comput. Vis. Image Underst., 2014

The Mystery of Faces: Investigating Face Contribution for Multimedia Event Detection.

[DOI]

,

,

,

,

Alexander G. Hauptmann

,

Proceedings of the International Conference on Multimedia Retrieval, 2014

Interactive Surveillance Event Detection through Mid-level Discriminative Representation.

[DOI]

,

,

,

,

,

,

,

,

Alexander G. Hauptmann

Proceedings of the International Conference on Multimedia Retrieval, 2014

Clustered Multi-task Linear Discriminant Analysis for View Invariant Color-Depth Action Recognition.

[DOI]

,

,

,

Ramanathan Subramanian

,

Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Minimizing dataset bias: Discriminative multi-task sparse coding through shared subspace learning for image classification.

[DOI]

,

,

,

Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Recognizing Daily Activities from First-Person Videos with Multi-task Clustering.

[DOI]

,

,

,

Proceedings of the Computer Vision - ACCV 2014, 2014

2013

GLocal structural feature selection with sparsity for multimedia data understanding.

[DOI]

,

,

,

,

Proceedings of the ACM Multimedia Conference, 2013

Multi-task linear discriminant analysis for multi-view action recognition.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Image Processing, 2013