Pan Zhang

Orcid: 0009-0004-7195-4159

Affiliations:
  • Shanghai Artificial Intelligence Laboratory, Shanghai, China
  • PIESAT Information Technology Co, Ltd., Beijing, China


According to our database1, Pan Zhang authored at least 41 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification.
CoRR, 2024

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.
CoRR, 2024

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing.
CoRR, 2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.
CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.
CoRR, 2024

Streaming Long Video Understanding with Large Language Models.
CoRR, 2024

ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing.
CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.
CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Water Body Extraction from SAR and Multi-Source Data Using Siamese Network-Based Segmentation.
Proceedings of the IGARSS 2024, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.
Proceedings of the Computer Vision - ECCV 2024, 2024

ShareGPT4V: Improving Large Multi-modal Models with Better Captions.
Proceedings of the Computer Vision - ECCV 2024, 2024

FreeDrag: Feature Dragging for Reliable Point-Based Image Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever you Want.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

VIGC: Visual Instruction Generation and Correction.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.
CoRR, 2023

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing.
CoRR, 2023

HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image.
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023

Hgdnet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

Fine-Grained Building Roof Instance Segmentation Based on Domain Adapted Pretraining and Composite Dual-Backbone.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023


  Loading...