Juncheng Li

Orcid: 0000-0003-2258-1291

Affiliations:

Zhejiang University, Hangzhou, China

According to our database¹, Juncheng Li authored at least 55 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2019

2020

2021

2022

2023

2024

2025

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

ITERATE: Image-Text Enhancement, Retrieval, and Alignment for Transmodal Evolution with LLMs.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

RustGraph: Robust Anomaly Detection in Dynamic Graphs by Jointly Learning Structural-Temporal Dependency.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., July, 2024

MAKIMA: Tuning-free Multi-Attribute Open-domain Video Editing via Mask-Guided Attention Modulation.

[BibT_eX]

[DOI]

CoRR, 2024

Boosting Private Domain Understanding of Efficient MLLMs: A Tuning-free, Adaptive, Universal Prompt Optimization Framework.

[BibT_eX]

[DOI]

CoRR, 2024

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining.

[BibT_eX]

[DOI]

CoRR, 2024

Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness.

[BibT_eX]

[DOI]

CoRR, 2024

SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing.

[BibT_eX]

[DOI]

CoRR, 2024

STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training.

[BibT_eX]

[DOI]

CoRR, 2024

AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea.

[BibT_eX]

[DOI]

CoRR, 2024

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms.

[BibT_eX]

[DOI]

CoRR, 2024

RADAR: Robust Two-stage Modality-incomplete Industrial Anomaly Detection.

[BibT_eX]

[DOI]

CoRR, 2024

Align<sup>2</sup>LLaVA: Cascaded Human and Large Language Model Preference Alignment for Multi-modal Instruction Curation.

[BibT_eX]

[DOI]

CoRR, 2024

TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition.

[BibT_eX]

[DOI]

CoRR, 2024

LASER: Tuning-Free LLM-Driven Attention Control for Efficient Text-conditioned Image-to-Animation.

[BibT_eX]

[DOI]

CoRR, 2024

HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

I3: Intent-Introspective Retrieval Conditioned on Instructions.

[BibT_eX]

[DOI]

Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024

Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Unified Generative and Discriminative Training for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

The 2nd International Workshop on Deep Multi-modal Generation and Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Deep Multimodal Generation and Retrieval, 2024

WorldGPT: Empowering LLM as Multimodal World Model.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DEMON24: ACM MM24 Demonstrative Instruction Following Challenge.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Fact : Teaching MLLMs with Faithful, Concise and Transferable Rationales.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Auto-Encoding Morph-Tokens for Multimodal LLM.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DIEM: Decomposition-Integration Enhancing Multimodal Insights.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer.

[BibT_eX]

[DOI]

CoRR, 2023

De-fine: Decomposing and Refining Visual Programs with Auto-Feedback.

[BibT_eX]

[DOI]

CoRR, 2023

ControlRetriever: Harnessing the Power of Instructions for Controllable Retrieval.

[BibT_eX]

[DOI]

CoRR, 2023

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2023

Interactive Data Synthesis for Systematic Vision Adaptation via LLMs-AIGCs Collaboration.

[BibT_eX]

[DOI]

CoRR, 2023

Meta-augmented Prompt Tuning for Better Few-shot Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Domain Adaptation for Video Object Grounding with Cascaded Debiasing Learning.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Gradient-Regulated Meta-Prompt Learning for Generalizable Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Reasoning Makes Good Annotators : An Automatic Task-specific Rules Distilling Framework for Low-resource Relation Extraction.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Are Binary Annotations Sufficient? Video Moment Retrieval via Hierarchical Uncertainty-based Active Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

DBA: Efficient Transformer with Dynamic Bilinear Low-Rank Attention.

[BibT_eX]

[DOI]

CoRR, 2022

BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval.

[BibT_eX]

[DOI]

CoRR, 2022

Fine-Grained Semantically Aligned Vision-Language Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Weakly-supervised Disentanglement Network for Video Fingerspelling Detection.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in Videos.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-Based Image Captioning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling.

[BibT_eX]

[DOI]

Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Walking with MIND: Mental Imagery eNhanceD Embodied QA.

[BibT_eX]

[DOI]

Proceedings of the 27th ACM International Conference on Multimedia, 2019

Juncheng Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...