Pan Zhang
Orcid: 0009-0004-7195-4159Affiliations:
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- PIESAT Information Technology Co, Ltd., Beijing, China
According to our database1,
Pan Zhang
authored at least 41 papers
between 2023 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024
ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification.
CoRR, 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.
CoRR, 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing.
CoRR, 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.
CoRR, 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
CoRR, 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Water Body Extraction from SAR and Multi-Source Data Using Siamese Network-Based Segmentation.
Proceedings of the IGARSS 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023
FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing.
CoRR, 2023
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023
Hgdnet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Fine-Grained Building Roof Instance Segmentation Based on Domain Adapted Pretraining and Composite Dual-Backbone.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023