Lewei Lu

Orcid: 0009-0009-9809-3818

According to our database¹, Lewei Lu authored at least 53 papers between 2020 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance.

[BibT_eX]

[DOI]

Vis. Intell., 2024

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding.

[BibT_eX]

[DOI]

CoRR, 2024

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

Multimodal 3D Reasoning Segmentation with Complex Scenes.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance.

[BibT_eX]

[DOI]

CoRR, 2024

MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Needle In A Multimodal Haystack.

[BibT_eX]

[DOI]

CoRR, 2024

Learning 1D Causal Visual Representation with De-focus Attention Networks.

[BibT_eX]

[DOI]

CoRR, 2024

Parameter-Inverted Image Pyramid Networks.

[BibT_eX]

[DOI]

CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.

[BibT_eX]

[DOI]

CoRR, 2024

Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures.

[BibT_eX]

[DOI]

CoRR, 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization.

[BibT_eX]

[DOI]

CoRR, 2024

Learning to Prompt Segment Anything Models.

[BibT_eX]

[DOI]

CoRR, 2024

3D Data Augmentation for Driving Scenes on Camera.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

The All-Seeing Project V2: Towards General Relation Comprehension of the Open World.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Weakly Supervised Monocular 3D Detection with a Single-View Image.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Intern VL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Masked AutoDecoder is Effective Multi-Task Vision Generalist.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Modeling Continuous Motion for 3D Point Cloud Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

ControlLLM: Augment Language Models with Tools by Searching on Graphs.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring the Potential of Flexible 8-bit Format: Design and Algorithm.

[BibT_eX]

[DOI]

CoRR, 2023

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory.

[BibT_eX]

[DOI]

CoRR, 2023

3D Data Augmentation for Driving Scenes on Camera.

[BibT_eX]

[DOI]

CoRR, 2023

Scene as Occupancy.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Distilling Focal Knowledge from Imperfect Expert for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Planning-oriented Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards All-in-One Pre-Training via Maximizing Multi-Modal Mutual Information.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Goal-oriented Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2022

Demystify Transformers & Convolutions in Modern Image Deep Networks.

[BibT_eX]

[DOI]

CoRR, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

CoRR, 2022

2021

Decoupled Spatial-Temporal Transformer for Video Inpainting.

[BibT_eX]

[DOI]

CoRR, 2021

Deformable DETR: Deformable Transformers for End-to-End Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask.

[BibT_eX]

[DOI]

CoRR, 2020

VL-BERT: Pre-training of Generic Visual-Linguistic Representations.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Lewei Lu

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...