Pan Zhang
Orcid: 0009-0004-7195-4159Affiliations:
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- PIESAT Information Technology Co, Ltd., Beijing, China
According to our database1,
Pan Zhang
authored at least 48 papers
between 2023 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2025
CoRR, January, 2025
CoRR, January, 2025
CoRR, January, 2025
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning.
CoRR, January, 2025
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction.
CoRR, January, 2025
2024
Large-Scale Fine-Grained Building Classification and Height Estimation for Semantic Urban Reconstruction: Outcome of the 2023 IEEE GRSS Data Fusion Contest.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.
CoRR, 2024
ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification.
CoRR, 2024
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models.
CoRR, 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing.
CoRR, 2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024
SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024
CoRR, 2024
CoRR, 2024
CoRR, 2024
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.
CoRR, 2024
DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Water Body Extraction from SAR and Multi-Source Data Using Siamese Network-Based Segmentation.
Proceedings of the IGARSS 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023
FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing.
CoRR, 2023
Proceedings of the SIGGRAPH Asia 2023 Conference Papers, 2023
Hgdnet: A Height-Hierarchy Guided Dual-Decoder Network for Single View Building Extraction and Height Estimation.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Fine-Grained Building Roof Instance Segmentation Based on Domain Adapted Pretraining and Composite Dual-Backbone.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023