2025

AD-FM: Multimodal LLMs for Anomaly Detection via Multi-Stage Reasoning and Fine-Grained Reward Optimization.

[DOI]

Jingyi Liao

Yongyi Su

CoRR, August, 2025

ALOHA: Adapting Local Spatio-Temporal Context to Enhance the Audio-Visual Semantic Segmentation.

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., June, 2025

SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding.

[DOI]

CoRR, June, 2025

A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations.

[DOI]

CoRR, June, 2025

Intra-Trajectory Consistency for Reward Modeling.

[DOI]

CoRR, June, 2025

Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval.

[DOI]

CoRR, May, 2025

MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval.

[DOI]

CoRR, May, 2025

VORTA: Efficient Video Diffusion via Routing Sparse Attention.

[DOI]

CoRR, May, 2025

T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation.

[DOI]

CoRR, May, 2025

Jailbreaking the Text-to-Video Generative Models.

[DOI]

CoRR, May, 2025

T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models.

[DOI]

CoRR, April, 2025

Large Language Model Agent: A Survey on Methodology, Applications and Challenges.

[DOI]

CoRR, March, 2025

Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift.

[DOI]

CoRR, March, 2025

A Survey of Direct Preference Optimization.

[DOI]

CoRR, March, 2025

Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

A Survey on Efficient Large Language Model Training: From Data-centric Perspectives.

[DOI]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

MARK: Multi-agent Collaboration with Ranking Guidance for Text-attributed Graph Clustering.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2025

Distribution-Consistency-Guided Multi-modal Hashing.

[DOI]

Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025

2024

Similarity Transitivity Broken-Aware Multi-Modal Hashing.

[DOI]

IEEE Trans. Knowl. Data Eng., November, 2024

AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration.

[DOI]

CoRR, 2024

SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing.

[DOI]

CoRR, 2024

Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines.

[DOI]

CoRR, 2024

Diffusion Model-Based Video Editing: A Survey.

[DOI]

CoRR, 2024

A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models.

[DOI]

CoRR, 2024

Deep Foreground-Background Weighted Cross-modal Hashing.

[DOI]

Proceedings of the Natural Language Processing and Chinese Computing, 2024

Data-Focus Proxy Hashing.

[DOI]

Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024

2023

Unsupervised Cross-Modal Hashing With Modality-Interaction.

[DOI]

IEEE Trans. Circuits Syst. Video Technol., September, 2023

Deep Cross-Modal Proxy Hashing.

[DOI]

IEEE Trans. Knowl. Data Eng., July, 2023

Unsupervised Cross-Modal Hashing via Semantic Text Mining.

[DOI]

IEEE Trans. Multim., 2023

Unsupervised Hashing with Semantic Concept Mining.

[DOI]

Proc. ACM Manag. Data, 2023

Global and Local Semantic Completion Learning for Vision-Language Pre-training.

[DOI]

CoRR, 2023

Data-Aware Proxy Hashing for Cross-modal Retrieval.

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Deep Cross-Modal Hashing With Hashing Functions and Unified Hash Codes Jointly Learning.

[DOI]

IEEE Trans. Knowl. Data Eng., 2022

Multimodal graph neural network for video procedural captioning.

[DOI]

Neurocomputing, 2022

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining.

[DOI]

CoRR, 2022

HunYuan_tvr for Text-Video Retrievial.

[DOI]

CoRR, 2022

Egocentric Video-Language Pretraining.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Partial-Softmax Loss based Deep Hashing.

[DOI]

Proceedings of the WWW '21: The Web Conference 2021, 2021

Weighted Gaussian Loss based Hamming Hashing.

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Hashing based Efficient Inference for Image-Text Matching.

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

2020

Deep Cross-modal Proxy Hashing.

[DOI]

CoRR, 2020

MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing.

[DOI]

Rong-Cheng Tu

Xianling Mao

Wei Wei

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

2019

Object Detection based Deep Unsupervised Hashing.

[DOI]

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018

Object Detection based Deep Unsupervised Hashing.

[DOI]

CoRR, 2018