Enze Xie

Orcid: 0000-0001-6890-1049

According to our database1, Enze Xie authored at least 100 papers between 2018 and 2024.

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.
IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.
IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

Lyra: Orchestrating Dual Correction in Automated Theorem Proving.
Trans. Mach. Learn. Res., 2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts.
CoRR, 2024

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving.
CoRR, 2024

Editing Massive Concepts in Text-to-Image Diffusion Models.
CoRR, 2024

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model.
CoRR, 2024

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.
CoRR, 2024

Accelerating Diffusion Sampling with Optimized Time Steps.
CoRR, 2024

On the Expressive Power of a Variant of the Looped Transformer.
CoRR, 2024

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation.
CoRR, 2024

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects.
CoRR, 2024

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models.
CoRR, 2024

SF3D: SlowFast Temporal 3D Object Detection.
Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LEGO-Prover: Neural Theorem Proving with Growing Libraries.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MagicDrive: Street View Generation with Diverse 3D Geometry Control.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

CycleMLP: A MLP-Like Architecture for Dense Visual Predictions.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields.
CoRR, 2023

A Survey of Reasoning with Foundation Models.
CoRR, 2023

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation.
CoRR, 2023

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction.
CoRR, 2023

Animate124: Animating One Image to 4D Dynamic Scene.
CoRR, 2023

DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model.
CoRR, 2023

DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and Generative Adversarial Networks.
CoRR, 2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation.
CoRR, 2023

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt.
CoRR, 2023

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts.
CoRR, 2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation.
CoRR, 2023

Progressive-Hint Prompting Improves Reasoning in Large Language Models.
CoRR, 2023

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction.
CoRR, 2023

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.
CoRR, 2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception.
CoRR, 2023

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection.
Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffComplete: Diffusion-based Generative 3D Shape Completion.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DDP: Diffusion Model for Dense Visual Prediction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Beyond One-to-One: Rethinking the Referring Image Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Improving Monocular Visual Odometry Using Learned Depth.
IEEE Trans. Robotics, 2022

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

PVT v2: Improved baselines with Pyramid Vision Transformer.
Comput. Vis. Media, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.
CoRR, 2022

M<sup>2</sup>BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation.
CoRR, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers.
CoRR, 2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation.
CoRR, 2022

Understanding The Robustness in Vision Transformers.
Proceedings of the International Conference on Machine Learning, 2022

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

CycleMLP: A MLP-like Architecture for Dense Prediction.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Polygon-Free: Unconstrained Scene Text Detection with Box Annotations.
Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.
Proceedings of the Computer Vision - ECCV 2022, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.
CoRR, 2021

Panoptic SegFormer.
CoRR, 2021

CycleMLP: A MLP-like Architecture for Dense Prediction.
CoRR, 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.
CoRR, 2021

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.
CoRR, 2021

FakeMix Augmentation Improves Transparent Object Detection.
CoRR, 2021

Unsupervised Pretraining for Object Detection by Patch Reidentification.
CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.
CoRR, 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Segmenting Transparent Objects in the Wild with Transformer.
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

What Makes for End-to-End Object Detection?
Proceedings of the 38th International Conference on Machine Learning, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Watch Only Once: An End-to-End Video Action Detection Framework.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

TransTrack: Multiple-Object Tracking with Transformer.
CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.
CoRR, 2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training.
CoRR, 2020

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild.
CoRR, 2020

1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation.
CoRR, 2020

Segmenting Transparent Objects in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

Scene Text Image Super-Resolution in the Wild.
Proceedings of the Computer Vision - ECCV 2020, 2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting.
Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

PolarMask: Single Shot Instance Segmentation With Polar Representation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild.
Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

TextSR: Content-Aware Text Super-Resolution Guided by Recognition.
CoRR, 2019

Shape Robust Text Detection with Progressive Scale Expansion Network.
CoRR, 2019

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Shape Robust Text Detection With Progressive Scale Expansion Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Scene Text Detection with Supervised Pyramid Context Network.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Fast OBDD Reordering using Neural Message Passing on Hypergraph.
CoRR, 2018

Attention Cropping: A Novel Data Augmentation Method for Real-world Plant Species Identification.
CoRR, 2018

Improving Fine-Grained Object Classification Using Adversarial Generated Unlabelled Samples.
Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018
