Enze Xie

Orcid: 0000-0001-6890-1049

According to our database¹, Enze Xie authored at least 108 papers between 2018 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2025

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2024

DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., October, 2024

Delving Into the Devils of Bird's-Eye-View Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., April, 2024

Deeply Unsupervised Patch Re-Identification for Pre-Training Object Detectors.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., March, 2024

Lyra: Orchestrating Dual Correction in Automated Theorem Proving.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Char-SAM: Turning Segment Anything Model into Scene Text Segmentation Annotator with Character-level Visual Prompts.

[BibT_eX]

[DOI]

CoRR, 2024

SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2024

DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving.

[BibT_eX]

[DOI]

CoRR, 2024

Editing Massive Concepts in Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

On the Expressive Power of a Variant of the Looped Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects.

[BibT_eX]

[DOI]

CoRR, 2024

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models.

[BibT_eX]

[DOI]

CoRR, 2024

SF3D: SlowFast Temporal 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intelligent Vehicles Symposium, 2024

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

LEGO-Prover: Neural Theorem Proving with Growing Libraries.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

MagicDrive: Street View Generation with Diverse 3D Geometry Control.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

PIXART-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Accelerating Diffusion Sampling with Optimized Time Steps.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

CycleMLP: A MLP-Like Architecture for Dense Visual Predictions.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

SERF: Fine-Grained Interactive 3D Segmentation and Editing with Radiance Fields.

[BibT_eX]

[DOI]

CoRR, 2023

A Survey of Reasoning with Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2023

Drag-A-Video: Non-rigid Video Editing with Point-based Interaction.

[BibT_eX]

[DOI]

CoRR, 2023

Animate124: Animating One Image to 4D Dynamic Scene.

[BibT_eX]

[DOI]

CoRR, 2023

Large Language Models as Automated Aligners for benchmarking Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

LEGO-Prover: Neural Theorem Proving with Growing Libraries.

[BibT_eX]

[DOI]

CoRR, 2023

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and Generative Adversarial Networks.

[BibT_eX]

[DOI]

CoRR, 2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Integrating Geometric Control into Text-to-Image Diffusion Models for High-Quality Detection Data Generation via Text Prompt.

[BibT_eX]

[DOI]

CoRR, 2023

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts.

[BibT_eX]

[DOI]

CoRR, 2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Progressive-Hint Prompting Improves Reasoning in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Vehicle-Infrastructure Cooperative 3D Object Detection via Feature Flow Prediction.

[BibT_eX]

[DOI]

CoRR, 2023

Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception.

[BibT_eX]

[DOI]

CoRR, 2023

Feature Enhancement with Text-Specific Region Contrast for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 6th Chinese Conference, 2023

Flow-Based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffComplete: Diffusion-based Generative 3D Shape Completion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird's-Eye View.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DDP: Diffusion Model for Dense Visual Prediction.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Beyond One-to-One: Rethinking the Referring Image Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MetaBEV: Solving Sensor Failures for 3D Detection and Map Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DT-Solver: Automated Theorem Proving with Dynamic-Tree Sampling Guided by Proof-level Value Function.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Improving Monocular Visual Odometry Using Learned Depth.

[BibT_eX]

[DOI]

IEEE Trans. Robotics, 2022

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

PVT v2: Improved baselines with Pyramid Vision Transformer.

[BibT_eX]

[DOI]

Comput. Vis. Media, 2022

Delving into the Devils of Bird's-eye-view Perception: A Review, Evaluation and Recipe.

[BibT_eX]

[DOI]

CoRR, 2022

M<sup>2</sup>BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation.

[BibT_eX]

[DOI]

CoRR, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

WegFormer: Transformers for Weakly Supervised Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2022

Understanding The Robustness in Vision Transformers.

[BibT_eX]

[DOI]

Animashree Anandkumar

Jiashi Feng

José M. Álvarez

Proceedings of the International Conference on Machine Learning, 2022

UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022

CycleMLP: A MLP-like Architecture for Dense Prediction.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Polygon-Free: Unconstrained Scene Text Detection with Box Annotations.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Towards Ultra-Resolution Neural Style Transfer via Thumbnail Instance Normalization.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation.

[BibT_eX]

[DOI]

CoRR, 2021

Panoptic SegFormer.

[BibT_eX]

[DOI]

CoRR, 2021

CycleMLP: A MLP-like Architecture for Dense Prediction.

[BibT_eX]

[DOI]

CoRR, 2021

PVTv2: Improved Baselines with Pyramid Vision Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text.

[BibT_eX]

[DOI]

CoRR, 2021

FakeMix Augmentation Improves Transparent Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Unsupervised Pretraining for Object Detection by Patch Reidentification.

[BibT_eX]

[DOI]

CoRR, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Trans2Seg: Transparent Object Segmentation with Transformer.

[BibT_eX]

[DOI]

CoRR, 2021

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Segmenting Transparent Objects in the Wild with Transformer.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

What Makes for End-to-End Object Detection?

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

DetCo: Unsupervised Contrastive Learning for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Watch Only Once: An End-to-End Video Action Detection Framework.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020

TransTrack: Multiple-Object Tracking with Transformer.

[BibT_eX]

[DOI]

CoRR, 2020

OneNet: Towards End-to-End One-Stage Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

SelfText Beyond Polygon: Unconstrained Text Detection with Box Supervision and Dynamic Self-Training.

[BibT_eX]

[DOI]

CoRR, 2020

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild.

[BibT_eX]

[DOI]

Weijia Wu

Ning Lu

Enze Xie

CoRR, 2020

1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Segmenting Transparent Objects in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Scene Text Image Super-Resolution in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Differentiable Hierarchical Graph Grouping for Multi-person Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

PolarMask: Single Shot Instance Segmentation With Polar Representation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2020 - 15th Asian Conference on Computer Vision, Kyoto, Japan, November 30, 2020

2019

TextSR: Content-Aware Text Super-Resolution Guided by Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Shape Robust Text Detection with Progressive Scale Expansion Network.

[BibT_eX]

[DOI]

CoRR, 2019

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Shape Robust Text Detection With Progressive Scale Expansion Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Scene Text Detection with Supervised Pyramid Context Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018

Fast OBDD Reordering using Neural Message Passing on Hypergraph.

[BibT_eX]

[DOI]

CoRR, 2018

Attention Cropping: A Novel Data Augmentation Method for Real-world Plant Species Identification.

[BibT_eX]

[DOI]

CoRR, 2018

Improving Fine-Grained Object Classification Using Adversarial Generated Unlabelled Samples.

[BibT_eX]

[DOI]

Enze Xie

Guangyao Li

Wenyu Liu

Proceedings of the Fourth IEEE International Conference on Multimedia Big Data, 2018

Enze Xie

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...