Zhe Chen

Affiliations:

Nanjing University, OpenGVLab, Shanghai AI Laboratory, China

According to our database¹, Zhe Chen authored at least 33 papers between 2021 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.

[BibT_eX]

[DOI]

Vis. Intell., 2024

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.

[BibT_eX]

[DOI]

CoRR, 2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.

[BibT_eX]

[DOI]

CoRR, 2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.

[BibT_eX]

[DOI]

CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

AVSegFormer: Audio-Visual Segmentation with Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

Champion Solution for the WSDM2023 Toloka VQA Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Graph Propagation Transformer for Graph Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

ELAN: Enhancing Temporal Action Detection with Location Awareness.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Vision Transformer Adapter for Dense Predictions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

DDP: Diffusion Model for Dense Visual Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.

[BibT_eX]

[DOI]

CoRR, 2021

Zhe Chen

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...