Renrui Zhang

Orcid: 0000-0003-4503-5277

According to our database1, Renrui Zhang authored at least 102 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
Int. J. Comput. Vis., May, 2024

CLIP-Adapter: Better Vision-Language Models with Feature Adapters.
Int. J. Comput. Vis., February, 2024

CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
CoRR, 2024

PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024

MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines.
CoRR, 2024

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners.
CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.
CoRR, 2024

MAVIS: Mathematical Visual Instruction Tuning.
CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024

RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.
CoRR, 2024

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024

TripletMix: Triplet Data Augmentation for 3D Understanding.
CoRR, 2024

Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation.
CoRR, 2024

TerDiT: Ternary Diffusion Models with Transformers.
CoRR, 2024

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching.
CoRR, 2024

MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Personalize Segment Anything Model with One Shot.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

MATHVERSE: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Proceedings of the Computer Vision - ECCV 2024, 2024

PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024

SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Gradient-based Parameter Selection for Efficient Fine-Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Cloud-Device Collaborative Learning for Multimodal Large Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
CoRR, 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise.
CoRR, 2023

Language-Assisted 3D Scene Understanding.
CoRR, 2023

3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V.
CoRR, 2023

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023

Improving Compositional Text-to-image Generation with Large Vision-Language Models.
CoRR, 2023

NOC: High-Quality Neural Object Cloning with 3D Lifting of Segment Anything.
CoRR, 2023

RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision.
CoRR, 2023

ImageBind-LLM: Multi-modality Instruction Tuning.
CoRR, 2023

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023

Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023

Personalize Segment Anything Model with One Shot.
CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.
CoRR, 2023

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance.
CoRR, 2023

LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking.
CoRR, 2023

Nearest Neighbors Meet Deep Neural Networks for Point Cloud Analysis.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

DS-Point: A Dual-Scale 3D Framework for Point Cloud Understanding.
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisiting Event-Based Video Frame Interpolation.
IROS, 2023

Joint-MAE: 2D-3D Joint Masked Autoencoders for 3D Point Cloud Pre-training.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PointCLIP V2: Prompting CLIP and GPT for Powerful 3D Open-world Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

SparseMAE: Sparse Training Meets Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

iQuery: Instruments as Queries for Audio-Visual Sound Separation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

CALIP: Zero-Shot Enhancement of CLIP with Parameter-Free Attention.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

Decorate the Newcomers: Visual Domain Prompt for Continual Test Time Adaptation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning.
CoRR, 2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning.
CoRR, 2022

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning.
CoRR, 2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner.
CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.
CoRR, 2022

Can Language Understand Depth?
CoRR, 2022

POS-BERT: Point Cloud One-Stage BERT Pre-Training.
CoRR, 2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection.
CoRR, 2022

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022

Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Can Language Understand Depth?
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2022, 2022

Frozen CLIP Models are Efficient Video Learners.
Proceedings of the Computer Vision - ECCV 2022, 2022

Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts.
CoRR, 2021

DSPoint: Dual-scale Point Cloud Recognition with High-frequency Fusion.
CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.
CoRR, 2021

Dual-stream Network for Visual Recognition.
CoRR, 2021

Differential Privacy Protection and Game Analysis of Intelligent Transportation Data.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Dual-stream Network for Visual Recognition.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2019
A variational image segmentation method exploring both intensity means and texture patterns.
Signal Process. Image Commun., 2019


  Loading...