We stand with Ukraine

We stand with Ukraine

Pan Zhang

Orcid: 0009-0004-7195-4159

Affiliations:

Shanghai Artificial Intelligence Laboratory, Shanghai, China
PIESAT Information Technology Co, Ltd., Beijing, China

According to our database¹, Pan Zhang authored at least 48 papers between 2023 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2025

RelightVid: Temporal-Consistent Diffusion Model for Video Relighting.

[BibT_eX]

[DOI]

,

,

Shangzhan Zhang

,

,

,

,

,

Gordon Wetzstein

,

CoRR, January, 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, January, 2025

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, January, 2025

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, January, 2025

2024

Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Michael Schmitt

,

,

,

Claudio Persello

,

IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Xingcheng Zhang

,

,

,

,

CoRR, 2024

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Xingcheng Zhang

,

,

,

,

,

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Xingcheng Zhang

,

,

,

CoRR, 2024

Streaming Long Video Understanding with Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Xingcheng Zhang

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Water Body Extraction from SAR and Multi-Source Data Using Siamese Network-Based Segmentation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IGARSS 2024, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

ShareGPT4V: Improving Large Multi-modal Models with Better Captions.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Computer Vision - ECCV 2024, 2024

FreeDrag: Feature Dragging for Reliable Point-Based Image Editing.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever you Want.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIGC: Visual Instruction Generation and Correction.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Xingcheng Zhang

,

,

,

CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2023

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Hgdnet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Fine-Grained Building Roof Instance Segmentation Based on Domain Adapted Pretraining and Composite Dual-Backbone.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Loading...