Juncheng Li

Orcid: 0000-0003-2258-1291

Affiliations:
  • Zhejiang University, Hangzhou, China


According to our database1, Juncheng Li authored at least 55 papers between 2019 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2019
2020
2021
2022
2023
2024
2025
0
5
10
15
20
25
30
15
7
2
14
7
5
2

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
ITERATE: Image-Text Enhancement, Retrieval, and Alignment for Transmodal Evolution with LLMs.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024
RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.
IEEE Trans. Knowl. Data Eng., July, 2024

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation.
CoRR, 2024

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.
CoRR, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.
CoRR, 2024

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness.
CoRR, 2024

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation.
CoRR, 2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing.
CoRR, 2024

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.
CoRR, 2024

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.
CoRR, 2024

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms.
CoRR, 2024

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection.
CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.
CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.
CoRR, 2024

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation.
CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.
CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.
Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.
Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

WorldGPT: Empowering LLM as Multimodal World Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DEMON24: ACM MM24 Demonstrative Instruction Following Challenge.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DIEM: Decomposition-Integration Enhancing Multimodal Insights.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer.
CoRR, 2023

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.
CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.
CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.
CoRR, 2023

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration.
CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.
CoRR, 2023

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reasoning Makes Good Annotators : An Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention.
CoRR, 2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval.
CoRR, 2022

Fine-Grained Semantically Aligned Vision-Language Pre-Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-supervised Disentanglement Network for Video Fingerspelling Detection.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Walking with MIND: Mental Imagery eNhanceD Embodied QA.
Proceedings of the 27th ACM International Conference on Multimedia, 2019


  Loading...