Wei Ji

Orcid: 0000-0002-8106-9768

Affiliations:
  • National University of Singapore
  • Zhejiang University, Hangzhou, China (former)


According to our database1, Wei Ji authored at least 72 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
In Defense of Clip-Based Video Relation Detection.
IEEE Trans. Image Process., 2024

Grounding is All You Need? Dual Temporal Grounding for Video Dialog.
CoRR, 2024

DriveDiTFit: Fine-tuning Diffusion Transformers for Autonomous Driving.
CoRR, 2024

Described Spatial-Temporal Video Detection.
CoRR, 2024

Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration.
CoRR, 2024

Weakly Supervised Video Moment Retrieval via Location-irrelevant Proposal Learning.
Proceedings of the Companion Proceedings of the ACM on Web Conference 2024, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.
Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

Hierarchical Debiasing and Noisy Correction for Cross-domain Video Tube Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Semantic Alignment for Multimodal Large Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SpeechEE: A Novel Benchmark for Speech Event Extraction.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Towards Small Object Editing: A Benchmark Dataset and A Training-Free Approach.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

NExT-Chat: An LMM for Chat, Detection and Segmentation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

NExT-GPT: Any-to-Any Multimodal LLM.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Domain-Wise Invariant Learning for Panoptic Scene Graph Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Mrtnet: Multi-Resolution Temporal Network for Video Sentence Grounding.
Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching.
Proceedings of the Computer Vision - ECCV 2024, 2024

Dysen-VDM: Empowering Dynamics-Aware Text-to-Video Diffusion with LLMs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Panoptic Scene Graph Generation with Semantics-Prototype Learning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
CoRR, 2023

Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatially Relation Matching.
CoRR, 2023

NExT-Chat: An LMM for Chat, Detection and Segmentation.
CoRR, 2023

Towards Complex-query Referring Image Segmentation: A Novel Benchmark.
CoRR, 2023

Empowering Dynamics-aware Text-to-Video Diffusion with Large Language Models.
CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.
CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.
CoRR, 2023

Transfer Visual Prompt Generator across LLMs.
CoRR, 2023

Multi-queue Momentum Contrast for Microvideo-Product Retrieval.
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

VPGTrans: Transfer Visual Prompt Generator across LLMs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Biased-Predicate Annotation Identification via Unbiased Visual Predicate Representation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Online Distillation-enhanced Multi-modal Transformer for Sequential Recommendation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Deep Multimodal Learning for Information Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Partial Annotation-based Video Moment Retrieval via Iterative Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ART: rule bAsed futuRe-inference deducTion.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

WINNER: Weakly-supervised hIerarchical decompositioN and aligNment for spatio-tEmporal video gRounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Generating Visual Spatial Description via Holistic 3D Scene Understanding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Video-Audio Domain Generalization via Confounder Disentanglement.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Conditional Hyper-Network for Blind Super-Resolution With Multiple Degradations.
IEEE Trans. Image Process., 2022

Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey.
Neurocomputing, 2022

MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding.
CoRR, 2022

MetaComp: Learning to Adapt for Online Depth Completion.
CoRR, 2022

3D Magic Mirror: Clothing Reconstruction from a Single Image via a Causal Perspective.
CoRR, 2022

Structured and Natural Responses Co-generation for Conversational Search.
Proceedings of the SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, July 11, 2022

Video Question Answering: Datasets, Algorithms and Challenges.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Fine-Grained Scene Graph Generation with Data Transfer.
Proceedings of the Computer Vision - ECCV 2022, 2022

Invariant Grounding for Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Content-Variant Reference Image Quality Assessment via Knowledge Distillation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Rethinking the Two-Stage Framework for Grounded Situation Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey.
CoRR, 2021

Deconfounded Video Moment Retrieval with Causal Intervention.
Proceedings of the SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

Video Visual Relation Detection via Iterative Inference.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

VidVRD 2021: The Third Grand Challenge on Video Relation Detection.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Boundary Proposal Network for Two-stage Natural Language Video Localization.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Context-Aware Graph Label Propagation Network for Saliency Detection.
IEEE Trans. Image Process., 2020

Context-Aware Deep Spatiotemporal Network for Hand Pose Estimation From Depth Images.
IEEE Trans. Cybern., 2020

Human-Centric Clothing Segmentation via Deformable Semantic Locality-Preserving Network.
IEEE Trans. Circuits Syst. Video Technol., 2020

2019
Multi-Task Structure-Aware Context Modeling for Robust Keypoint-Based Object Tracking.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

2018
Context-Aware Deep Spatio-Temporal Network for Hand Pose Estimation from Depth Images.
CoRR, 2018

Semantic Locality-Aware Deformable Network for Clothing Segmentation.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018


  Loading...