Hang Xu

Orcid: 0000-0003-3645-8972

Affiliations:
  • Huawei Noah's Ark Lab, Shanghai, China
  • Hong Kong University (PhD 2018)


According to our database1, Hang Xu authored at least 152 papers between 2018 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
A Survey on Video Diffusion Models.
ACM Comput. Surv., February, 2025

2024
Correctable Landmark Discovery via Large Models for Vision-Language Navigation.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

Fine-Grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection.
IEEE Trans. Neural Networks Learn. Syst., November, 2024

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation.
CoRR, 2024

UNIT: Unifying Image and Text Recognition in One Vision Encoder.
CoRR, 2024

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation.
CoRR, 2024

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models.
CoRR, 2024

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding.
CoRR, 2024

Collaborative Novel Object Discovery and Box-Guided Cross-Modal Alignment for Open-Vocabulary 3D Object Detection.
CoRR, 2024

LaneCorrect: Self-supervised Lane Detection.
CoRR, 2024

OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation.
CoRR, 2024

NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning.
CoRR, 2024

From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs.
CoRR, 2024

Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models.
CoRR, 2024

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Ins-DetCLIP: Aligning Detection Model to Follow Human-Language Instruction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextField3D: Towards Enhancing Open-Vocabulary 3D Generation with Noisy Text Fields.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MagDiff: Multi-alignment Diffusion for High-Fidelity Video Generation and Editing.
Proceedings of the Computer Vision - ECCV 2024, 2024

Reason2Drive: Towards Interpretable and Chain-Based Reasoning for Autonomous Driving.
Proceedings of the Computer Vision - ECCV 2024, 2024

PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion.
Proceedings of the Computer Vision - ECCV 2024, 2024

Implicit Concept Removal of Diffusion Models.
Proceedings of the Computer Vision - ECCV 2024, 2024

JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation.
Proceedings of the Computer Vision - ECCV 2024, 2024

LayerDiff: Exploring Text-Guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-Text Transformation.
Proceedings of the Computer Vision - ECCV 2024, 2024

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-Fine Pose-Reversible Guidance.
Proceedings of the Computer Vision - ECCV 2024, 2024

Self-Adaptive Reality-Guided Diffusion for Artifact-Free Super-Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DetCLIPv3: Towards Versatile Generative Open-Vocabulary Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Holistic Autonomous Driving Understanding by Bird'View Injected Multi-Modal Large Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Any-Size-Diffusion: Toward Efficient Text-Driven Synthesis for Any-Size HD Images.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Point-Guided Contrastive Learning for Monocular 3-D Object Detection.
IEEE Trans. Cybern., 2023

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning.
CoRR, 2023

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model.
CoRR, 2023

DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance.
CoRR, 2023

VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model.
CoRR, 2023

Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models.
CoRR, 2023

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis.
CoRR, 2023

Geom-Erasing: Geometry-Driven Removal of Implicit Concept in Diffusion Models.
CoRR, 2023

HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving.
CoRR, 2023

Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation.
CoRR, 2023

MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation.
CoRR, 2023

Boosting Text-to-Image Diffusion Models with Fine-Grained Semantic Rewards.
CoRR, 2023

Boosting Visual-Language Models by Exploiting Hard Samples.
CoRR, 2023

Towards Medical Artificial General Intelligence via Knowledge-Enhanced Multimodal Pretraining.
CoRR, 2023

Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving.
CoRR, 2023

Topology Reasoning for Driving Scenes.
CoRR, 2023

Towards Universal Vision-language Omni-supervised Segmentation.
CoRR, 2023

Entity-Level Text-Guided Image Manipulation.
CoRR, 2023

OpenLane-V2: A Topology Reasoning Benchmark for Unified 3D HD Mapping.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SUIT: Learning Significance-Guided Information for 3D Temporal Detection.
IROS, 2023

ViewCo: Discovering Text-Supervised Segmentation Masks via Multi-View Semantic Consistency.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CO3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffDis: Empowering Generative Diffusion Model with Cross-Modal Discrimination Capability.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PIDRo: Parallel Isomeric Attention with Dynamic Routing for Text-Video Retrieval.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DetGPT: Detect What You Need via Reasoning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ConQueR: Query Contrast Voxel-DETR for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CLIP<sup>2</sup>: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Mixed Autoencoder for Self-Supervised Visual Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

NLIP: Noise-Robust Language-Image Pre-training.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Exploring Visual Interpretability for Contrastive Language-Image Pre-training.
CoRR, 2022

Softmax-free Linear Transformers.
CoRR, 2022

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving.
CoRR, 2022

ZeroGen<sup>+</sup>: Self-Guided High-Quality Data Generation in Efficient Zero-Shot Learning.
CoRR, 2022

Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework.
CoRR, 2022

DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

FILIP: Fine-grained Interactive Language-Image Pre-Training.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Revisiting Over-smoothing in BERT from the Perspective of Graph.
Proceedings of the Tenth International Conference on Learning Representations, 2022

ZeroGen: Efficient Zero-shot Learning via Dataset Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction.
Proceedings of the Computer Vision - ECCV 2022, 2022

Generative Negative Text Replay for Continual Vision-Language Pretraining.
Proceedings of the Computer Vision - ECCV 2022, 2022

RCLane: Relay Chain Prediction for Lane Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning Ego 3D Representation as Ray Tracing.
Proceedings of the Computer Vision - ECCV 2022, 2022

Open-World Semantic Segmentation via Contrasting and Clustering Vision-Language Embedding.
Proceedings of the Computer Vision - ECCV 2022, 2022

CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving.
Proceedings of the Computer Vision - ECCV 2022, 2022

MPPNet: Multi-frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Continual Object Detection via Prototypical Task Correlation Guided Gating Mechanism.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ONCE-3DLanes: Building Monocular 3D Lane Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Point2Seq: Detecting 3D Objects as Sequences.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

ManiTrans: Entity-Level Text-Guided Image Manipulation via Token-wise Semantic Alignment and Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Laneformer: Object-Aware Row-Column Transformers for Lane Detection.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

AutoBERT-Zero: Evolving BERT Backbone from Scratch.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving.
CoRR, 2021

One Million Scenes for Autonomous Driving: ONCE Dataset.
CoRR, 2021

BWCP: Probabilistic Learning-to-Prune Channels for ConvNets via Batch Whitening.
CoRR, 2021

Unsupervised Pretraining for Object Detection by Patch Reidentification.
CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.
CoRR, 2021

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

One Million Scenes for Autonomous Driving: ONCE Dataset.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

SOFT: Softmax-free Transformer with Linear Complexity.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Segmenting Transparent Objects in the Wild with Transformer.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

SparseBERT: Rethinking the Importance Analysis in Self-attention.
Proceedings of the 38th International Conference on Machine Learning, 2021

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search.
Proceedings of the 9th International Conference on Learning Representations, 2021

Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-Modal Pretraining.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of Models.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Voxel Transformer for 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

MultiSiam: Self-supervised Multi-instance Siamese Representation Learning for Autonomous Driving.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Adversarial Robustness for Unsupervised Domain Adaptation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

C<sup>3</sup>-SemiSeg: Contrastive Semi-supervised Segmentation via Cross-set Learning and Dynamic Class-balancing.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Effective Sparsification of Neural Networks With Global Sparsity Constraint.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Joint-DetNAS: Upgrade Your Detector With NAS, Pruning and Dynamic Distillation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

TransNAS-Bench-101: Improving Transferability and Generalizability of Cross-Task Neural Architecture Search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Dynamic and Scalable Active Learning with Neural Architecture Adaption for Object Detection.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

How to Save your Annotation Cost for Panoptic Segmentation?
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
VEGA: Towards an End-to-End Configurable AutoML Pipeline.
CoRR, 2020

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

CurveLane-NAS: Unifying Lane-Sensitive Architecture Search and Adaptive Point Blending.
Proceedings of the Computer Vision - ECCV 2020, 2020

AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling.
Proceedings of the Computer Vision - ECCV 2020, 2020

CATCH: Context-Based Meta Reinforcement Learning for Transferrable Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

SP-NAS: Serial-to-Parallel Backbone Search for Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SM-NAS: Structural-to-Modular Neural Architecture Search for Object Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

ElixirNet: Relation-Aware Network Architecture Adaptation for Medical Lesion Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

EHSOD: CAM-Guided End-to-End Hybrid-Supervised Object Detection with Cascade Refinement.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Rank aggregation using latent-scale distance-based models.
Stat. Comput., 2019

Multi-objective Neural Architecture Search via Predictive Network Performance Optimization.
CoRR, 2019

Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Reasoning-RCNN: Unifying Adaptive Global Reasoning Into Large-Scale Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Spatial-Aware Graph Relation Network for Large-Scale Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Angle-based models for ranking data.
Comput. Stat. Data Anal., 2018

Hybrid Knowledge Routed Modules for Large-scale Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018


  Loading...