Zhe Chen

Affiliations:
  • Nanjing University, OpenGVLab, Shanghai AI Laboratory, China


According to our database1, Zhe Chen authored at least 33 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.
Vis. Intell., 2024

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.
CoRR, 2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.
CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.
CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.
CoRR, 2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.
CoRR, 2024

Needle In A Multimodal Haystack.
CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.
CoRR, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.
CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.
CoRR, 2024

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.
Proceedings of the Computer Vision - ECCV 2024, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.
CoRR, 2023

AVSegFormer: Audio-Visual Segmentation with Transformer.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

Champion Solution for the WSDM2023 Toloka VQA Challenge.
CoRR, 2023

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Graph Propagation Transformer for Graph Representation Learning.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

ELAN: Enhancing Temporal Action Detection with Location Awareness.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Vision Transformer Adapter for Dense Predictions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DDP: Diffusion Model for Dense Visual Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.
CoRR, 2021


  Loading...