Ran Xu

Ning Yu

Proceedings of the IEEE International Conference on Robotics and Automation, 2024

Position: TrustLLM: Trustworthiness in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ULIP-2: Towards Scalable Multimodal Pre-Training for 3D Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

HIVE: Harnessing Human Feedback for Instructional Visual Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents.

[BibT_eX]

[DOI]

CoRR, 2023

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

REX: Rapid Exploration and eXploitation for AI Agents.

[BibT_eX]

[DOI]

CoRR, 2023

ULIP-2: Towards Scalable Multimodal Pre-training for 3D Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

Model-Agnostic Hierarchical Attention for 3D Object Detection.

[BibT_eX]

[DOI]

Manli Shu

Ning Yu

Juan Carlos Niebles

Caiming Xiong

CoRR, 2023

UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Robustness Evaluation of Transformer-Based Form Field Extractors via Form Attacks.

[BibT_eX]

[DOI]

Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding.

[BibT_eX]

[DOI]

Mingfei Gao

Chen Xing

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mask-Free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Tackling Data Heterogeneity in Federated Learning with Class Prototypes.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding.

[BibT_eX]

[DOI]

Mingfei Gao

Chen Xing

CoRR, 2022

Burn After Reading: Online Adaptation for Cross-domain Streaming Data.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Open Vocabulary Object Detection with Pseudo Bounding-Box Labels.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

DocQueryNet: Value Retrieval with Arbitrary Queries for Form-like Documents.

[BibT_eX]

[DOI]

Proceedings of the 29th International Conference on Computational Linguistics, 2022

TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation.

[BibT_eX]

[DOI]

Jun Wang

Mingfei Gao

Yuqian Hu

Ramprasaath R. Selvaraju

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Value Retrieval with Arbitrary Queries for Form-like Documents.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Open Vocabulary Object Detection without Human-provided Bounding Boxes.

[BibT_eX]

[DOI]

CoRR, 2021

Field Extraction from Forms with Unlabeled Data.

[BibT_eX]

[DOI]

CoRR, 2021

Proposal Learning for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Proposal Learning for Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

2019

Context-aware Active Multi-Step Reinforcement Learning.

[BibT_eX]

[DOI]

Gang Chen

Dingcheng Li

CoRR, 2019

2018

Deep ranking structural support vector machine for image tagging.

[BibT_eX]

[DOI]

Gang Chen

Zhi Yang

Pattern Recognit. Lett., 2018

2016

Sequential Labeling with Online Deep Learning: Exploring Model Initialization.

[BibT_eX]

[DOI]

Gang Chen

Sargur N. Srihari

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2016

2015

Human action segmentation with hierarchical supervoxel consistency.

[BibT_eX]

[DOI]

Jiasen Lu