SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding.
CoRR, June, 2025
A Survey of Automatic Evaluation Methods on Text, Visual and Speech Generations.
,
,
,
,
,
,
,
,
,
,
CoRR, June, 2025
Intra-Trajectory Consistency for Reward Modeling.
CoRR, June, 2025
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval.
CoRR, May, 2025
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval.
CoRR, May, 2025
VORTA: Efficient Video Diffusion via Routing Sparse Attention.
CoRR, May, 2025
T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation.
CoRR, May, 2025
Jailbreaking the Text-to-Video Generative Models.
CoRR, May, 2025
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models.
CoRR, April, 2025
Large Language Model Agent: A Survey on Methodology, Applications and Challenges.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Robust Distribution Alignment for Industrial Anomaly Detection under Distribution Shift.
CoRR, March, 2025
A Survey of Direct Preference Optimization.
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Distribution-Consistency-Guided Multi-modal Hashing.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
Similarity Transitivity Broken-Aware Multi-Modal Hashing.
IEEE Trans. Knowl. Data Eng., November, 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration.
CoRR, 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing.
CoRR, 2024
Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines.
CoRR, 2024
Automatic Evaluation for Text-to-image Generation: Task-decomposed Framework, Distilled Training, and Meta-evaluation Benchmark.
CoRR, 2024
Diffusion Model-Based Video Editing: A Survey.
CoRR, 2024
A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models.
CoRR, 2024
Deep Foreground-Background Weighted Cross-modal Hashing.
Proceedings of the Natural Language Processing and Chinese Computing, 2024
Data-Focus Proxy Hashing.
Proceedings of the 27th International Conference on Computer Supported Cooperative Work in Design, 2024
Unsupervised Cross-Modal Hashing With Modality-Interaction.
IEEE Trans. Circuits Syst. Video Technol., September, 2023
Deep Cross-Modal Proxy Hashing.
IEEE Trans. Knowl. Data Eng., July, 2023
Unsupervised Cross-Modal Hashing via Semantic Text Mining.
IEEE Trans. Multim., 2023
Unsupervised Hashing with Semantic Concept Mining.
Proc. ACM Manag. Data, 2023
Global and Local Semantic Completion Learning for Vision-Language Pre-training.
CoRR, 2023
Data-Aware Proxy Hashing for Cross-modal Retrieval.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Deep Cross-Modal Hashing With Hashing Functions and Unified Hash Codes Jointly Learning.
IEEE Trans. Knowl. Data Eng., 2022
Multimodal graph neural network for video procedural captioning.
Neurocomputing, 2022
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
HunYuan_tvr for Text-Video Retrievial.
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Egocentric Video-Language Pretraining.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Partial-Softmax Loss based Deep Hashing.
Proceedings of the WWW '21: The Web Conference 2021, 2021
Weighted Gaussian Loss based Hamming Hashing.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Hashing based Efficient Inference for Image-Text Matching.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
Deep Cross-modal Proxy Hashing.
CoRR, 2020
MLS3RDUH: Deep Unsupervised Hashing via Manifold based Local Semantic Similarity Structure Reconstructing.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020
Object Detection based Deep Unsupervised Hashing.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Object Detection based Deep Unsupervised Hashing.
CoRR, 2018