Ran Xu

Orcid: 0009-0004-4585-5261

Affiliations:
  • Salesforce Research, Salesforce AI Research,


According to our database1, Ran Xu authored at least 52 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs.
CoRR, 2024

Trust but Verify: Programmatic VLM Evaluation in the Wild.
CoRR, 2024

xLAM: A Family of Large Action Models to Empower AI Agent Systems.
CoRR, 2024

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations.
CoRR, 2024

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models.
CoRR, 2024

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens.
CoRR, 2024

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases.
CoRR, 2024

Text2Data: Low-Resource Data Generation with Textual Control.
CoRR, 2024

TrustLLM: Trustworthiness in Large Language Models.
CoRR, 2024

Hierarchical Point Attention for Indoor 3D Object Detection.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024


Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer.
Proceedings of the Computer Vision - ECCV 2024, 2024

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant.
Proceedings of the Computer Vision - ECCV 2024, 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.
Proceedings of the Computer Vision - ECCV 2024, 2024

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HIVE: Harnessing Human Feedback for Instructional Visual Editing.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.
CoRR, 2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents.
CoRR, 2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.
CoRR, 2023

REX: Rapid Exploration and eXploitation for AI Agents.
CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.
CoRR, 2023

Model-Agnostic Hierarchical Attention for 3D Object Detection.
CoRR, 2023

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask-Free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Tackling Data Heterogeneity in Federated Learning with Class Prototypes.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding.
CoRR, 2022

Burn After Reading: Online Adaptation for Cross-domain Streaming Data.
Proceedings of the Computer Vision - ECCV 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.
Proceedings of the Computer Vision - ECCV 2022, 2022

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents.
Proceedings of the 29th International Conference on Computational Linguistics, 2022

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Value Retrieval with Arbitrary Queries for Form-like Documents.
CoRR, 2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.
CoRR, 2021

Field Extraction from Forms with Unlabeled Data.
CoRR, 2021

Proposal Learning for Semi-Supervised Object Detection.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Proposal Learning for Semi-Supervised Object Detection.
CoRR, 2020

2019
Context-aware Active Multi-Step Reinforcement Learning.
CoRR, 2019

2018
Deep ranking structural support vector machine for image tagging.
Pattern Recognit. Lett., 2018

2016
Sequential Labeling with Online Deep Learning: Exploring Model Initialization.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

2015
Human action segmentation with hierarchical supervoxel consistency.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Compositional Structure Learning for Action Understanding.
CoRR, 2014

Actionness Ranking with Lattice Conditional Ordinal Random Fields.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2012
Random forests for metric learning with implicit pairwise position dependence.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012

Combining Skeletal Pose with Local Motion for Human Activity Recognition.
Proceedings of the Articulated Motion and Deformable Objects, 2012


  Loading...