Renrui Zhang

Orcid: 0000-0003-4503-5277

According to our database¹, Renrui Zhang authored at least 102 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., May, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., February, 2024

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.

[BibT_eX]

[DOI]

CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.

[BibT_eX]

[DOI]

CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.

[BibT_eX]

[DOI]

CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.

[BibT_eX]

[DOI]

CoRR, 2024

TripletMix: Triplet Data Augmentation for 3D Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.

[BibT_eX]

[DOI]

CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Gradient-based Parameter Selection for Efficient Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Cloud-Device Collaborative Learning for Multimodal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.

[BibT_eX]

[DOI]

CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.

[BibT_eX]

[DOI]

CoRR, 2023

Language-Assisted 3D Scene Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.

[BibT_eX]

[DOI]

CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything.

[BibT_eX]

[DOI]

CoRR, 2023

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.

[BibT_eX]

[DOI]

CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.

[BibT_eX]

[DOI]

CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.

[BibT_eX]

[DOI]

CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Personalize Segment Anything Model with One Shot.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.

[BibT_eX]

[DOI]

CoRR, 2023

Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

DS-Point: A Dual-Scale 3D Framework for Point Cloud Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Event-Based Video Frame Interpolation.

[BibT_eX]

[DOI]

IROS, 2023

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

iQuery: Instruments as Queries for Audio-Visual Sound Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning.

[BibT_eX]

[DOI]

CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.

[BibT_eX]

[DOI]

CoRR, 2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

Can Language Understand Depth?

[BibT_eX]

[DOI]

Renrui Zhang

Ziyao Zeng

Ziyu Guo

CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Can Language Understand Depth?

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts.

[BibT_eX]

[DOI]

CoRR, 2021

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Differential Privacy Protection and Game Analysis of Intelligent Transportation Data.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Dual-stream Network for Visual Recognition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2019

A variational image segmentation method exploring both intensity means and texture patterns.

[BibT_eX]

[DOI]

Signal Process. Image Commun., 2019

Renrui Zhang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...