2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills.
CoRR, April, 2025

Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model.
CoRR, March, 2025

Compositional Caching for Training-free Open-vocabulary Attribute Detection.
CoRR, March, 2025

ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction.
CoRR, March, 2025

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-tuning.
CoRR, March, 2025

Attention Reveals More Than Tokens: Training-Free Long-Context Reasoning with Attention-guided Retrieval.
CoRR, March, 2025

Towards Vector Optimization on Low-Dimensional Vector Symbolic Architecture.
CoRR, February, 2025

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1.
CoRR, February, 2025

A First-order Generative Bilevel Optimization Framework for Diffusion Models.
CoRR, February, 2025

Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025

ThermoHands: A Benchmark for 3D Hand Pose Estimation from Egocentric Thermal Images.
Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems, 2025

Investigating the Shortcomings of LLMs in Step-by-Step Legal Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, USA, April 29, 2025

Quantized-ViT Efficient Training via Fisher Matrix Regularization.
Proceedings of the MultiMedia Modeling, 2025

UniMuMo: Unified Text, Music, and Motion Generation.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024
Forget Vectors at Play: Universal Input Perturbations Driving Machine Unlearning in Image Classification.
CoRR, 2024

Diverse Score Distillation.
CoRR, 2024

Safeguarding Text-to-Image Generation via Inference-Time Prompt-Noise Optimization.
CoRR, 2024

QUOTA: Quantifying Objects with Text-to-Image Models for Any Domain.
CoRR, 2024

UOE: Unlearning One Expert Is Enough For Mixture-of-experts LLMS.
CoRR, 2024

Prompt Diffusion Robustifies Any-Modality Prompt Learning.
CoRR, 2024

Understanding Matrix Function Normalizations in Covariance Pooling through the Lens of Riemannian Geometry.
CoRR, 2024

Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization.
CoRR, 2024

MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection.
CoRR, 2024

A Survey on Large Language Model-Based Game Agents.
CoRR, 2024

Training-Free Semantic Segmentation via LLM-Supervision.
CoRR, 2024

Urban Scene Diffusion through Semantic Occupancy Map.
CoRR, 2024

Adaptive Deep Neural Network Inference Optimization with EENet.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

UnlearnCanvas: Stylized Image Dataset for Enhanced Machine Unlearning Evaluation in Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Advancing the Robustness of Large Language Models through Self-Denoised Smoothing.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Short Papers, 2024

Boosting Online 3D Multi-Object Tracking through Camera-Radar Cross Check.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

CenterRadarNet: Joint 3D Object Detection and Tracking Framework Using 4D FMCW Radar.
Proceedings of the IEEE International Conference on Image Processing, 2024

A Method for Bilevel Optimization with Convex Lower-Level Problem.
Proceedings of the IEEE International Conference on Acoustics, 2024

Variance Reduction Can Improve Trade-Off in Multi-Objective Learning.
Proceedings of the IEEE International Conference on Acoustics, 2024

Open-world Multi-label Text Classification with Extremely Weak Supervision.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Self-adapting Large Visual-Language Models to Edge Devices Across Visual Modalities.
Proceedings of the Computer Vision - ECCV 2024, 2024

Enhancing Post-Training Quantization Calibration Through Contrastive Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Efficient Multitask Dense Predictor via Binarization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Riemannian Multinomial Logistics Regression for SPD Neural Networks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Answer is All You Need: Instruction-following Text Embedding via Answering the Question.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

WaveFormer: Wavelet Transformer for Noise-Robust Video Inpainting.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
From Trojan Horses to Castle Walls: Unveiling Bilateral Backdoor Effects in Diffusion Models.
CoRR, 2023

CenterRadarNet: Joint 3D Object Detection and Tracking Framework using 4D FMCW Radar.
CoRR, 2023

Fast and Resource-Efficient Object Tracking on Edge Devices: A Measurement Study.
CoRR, 2023

A<sup>2</sup>Nav: Action-Aware Zero-Shot Robot Navigation by Exploiting Vision-and-Language Ability of Foundation Models.
CoRR, 2023

Riemannian Multiclass Logistics Regression for SPD Neural Networks.
CoRR, 2023

Model Sparsification Can Simplify Machine Unlearning.
CoRR, 2023

Optical Flow Estimation in 360° Videos: Dataset, Model and Application.
CoRR, 2023

EENet: Learning to Early Exit for Adaptive Inference.
CoRR, 2023

Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Graph Mixture of Experts: Learning on Large-Scale Graphs with Explicit Diversity Modeling.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Model Sparsity Can Simplify Machine Unlearning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Spatially-Aware Human-Object Interaction Detection with Cross-Modal Enhancement.
Proceedings of the Neural Information Processing - 30th International Conference, 2023

Causal-DFQ: Causality Guided Data-free Network Quantization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Network Specialization via Feature-level Knowledge Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Amplifying Object Tracking Performance on Edge Devices.
Proceedings of the 5th IEEE International Conference on Cognitive Machine Intelligence, 2023

2022
Learning Omnidirectional Flow in 360-degree Video via Siamese Representation.
CoRR, 2022

Learning Omnidirectional Flow in 360$^\circ $ Video via Siamese Representation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Deep Normalized Cross-Modal Hashing with Bi-Direction Relation Reasoning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Parallel Generative Adversarial Network for Third-person to First-person Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021
A Metamodel and Framework for Artificial General Intelligence From Theory to Practice.
J. Artif. Intell. Conscious., 2021

Cross-View Exocentric to Egocentric Video Synthesis.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
Exocentric to Egocentric Image Generation Via Parallel Generative Adversarial Network.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Cycle In Cycle Generative Adversarial Networks for Keypoint-Guided Image Generation.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

2017
Learning with Shared Information for Image and Video Analysis.
PhD thesis, 2017

Graph-based clustering and ranking for diversified image search.
Multim. Syst., 2017

2016
Active domain adaptation with noisy labels for multimedia analysis.
World Wide Web, 2016

A Multi-Task Learning Framework for Head Pose Estimation under Target Motion.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

2015
Event Oriented Dictionary Learning for Complex Event Detection.
IEEE Trans. Image Process., 2015

Egocentric Daily Activity Recognition via Multitask Clustering.
IEEE Trans. Image Process., 2015

Inferring Painting Style with Multi-Task Dictionary Learning.
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015

Complex Event Detection via Event Oriented Dictionary Learning.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Multitask Linear Discriminant Analysis for View Invariant Action Recognition.
IEEE Trans. Image Process., 2014

GLocal tells you more: Coupling GLocal structural for feature selection with sparsity for image and video classification.
Comput. Vis. Image Underst., 2014

The Mystery of Faces: Investigating Face Contribution for Multimedia Event Detection.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Interactive Surveillance Event Detection through Mid-level Discriminative Representation.
Proceedings of the International Conference on Multimedia Retrieval, 2014

Clustered Multi-task Linear Discriminant Analysis for View Invariant Color-Depth Action Recognition.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Minimizing dataset bias: Discriminative multi-task sparse coding through shared subspace learning for image classification.
Proceedings of the 2014 IEEE International Conference on Image Processing, 2014

Recognizing Daily Activities from First-Person Videos with Multi-task Clustering.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
GLocal structural feature selection with sparsity for multimedia data understanding.
Proceedings of the ACM Multimedia Conference, 2013

Multi-task linear discriminant analysis for multi-view action recognition.
Proceedings of the IEEE International Conference on Image Processing, 2013