Renrui Zhang
Orcid: 0000-0003-4503-5277
According to our database1,
Renrui Zhang
authored at least 102 papers
between 2019 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Int. J. Comput. Vis., May, 2024
Int. J. Comput. Vis., February, 2024
CoPESD: A Multi-Level Surgical Motion Dataset for Training Large Vision-Language Models to Co-Pilot Endoscopic Submucosal Dissection.
CoRR, 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions.
CoRR, 2024
CoRR, 2024
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024
RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation.
CoRR, 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis.
CoRR, 2024
CoRR, 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers.
CoRR, 2024
CoRR, 2024
CoRR, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
CoRR, 2024
Proceedings of the IEEE International Conference on Robotics and Automation, 2024
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation.
Proceedings of the Computer Vision - ECCV 2024, 2024
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models.
Proceedings of the Computer Vision - ECCV 2024, 2024
No Time to Train: Empowering Non-Parametric Networks for Few-Shot 3D Scene Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
FM-OV3D: Foundation Model-Based Cross-Modal Knowledge Blending for Open-Vocabulary 3D Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
CoRR, 2023
ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model.
CoRR, 2023
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models.
CoRR, 2023
CoRR, 2023
CoRR, 2023
CoRR, 2023
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following.
CoRR, 2023
Less is More: Towards Efficient Few-shot 3D Semantic Segmentation via Training-free Networks.
CoRR, 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation.
CoRR, 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance.
CoRR, 2023
CoRR, 2023
Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023
CoRR, 2023
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Learning 3D Representations from 2D Pre-Trained Models via Image-to-Point Masked Autoencoders.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Prompt, Generate, Then Cache: Cascade of Foundation Models Makes Strong Few-Shot Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
CoRR, 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual and Language Learning.
CoRR, 2022
Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning.
CoRR, 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
CoRR, 2021
Differential Privacy Protection and Game Analysis of Intelligent Transportation Data.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Proceedings of the 32nd British Machine Vision Conference 2021, 2021
2019
A variational image segmentation method exploring both intensity means and texture patterns.
Signal Process. Image Commun., 2019