Xiangyu Zhang

Orcid: 0000-0003-2138-4608

Affiliations:

Megvii Inc., Beijing, China
Xi'an Jiaotong University, Department of Electrical Engineering, China (PhD 2017)
Microsoft Research Asia, China (former)

According to our database¹, Xiangyu Zhang authored at least 136 papers between 2012 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., November, 2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., July, 2024

Reconstructive Visual Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.

[BibT_eX]

[DOI]

CoRR, 2024

Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2024

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?

[BibT_eX]

[DOI]

CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.

[BibT_eX]

[DOI]

CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.

[BibT_eX]

[DOI]

CoRR, 2024

Stream Query Denoising for Vectorized HD Map Construction.

[BibT_eX]

[DOI]

CoRR, 2024

Self-Supervised Visual Preference Alignment.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Merlin: Empowering Multimodal LLMs with Foresight Minds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Far3D: Expanding the Horizon for Surround-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DDAE: Towards Deep Dynamic Vision BERT Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Scale-Aware Automatic Augmentations for Object Detection With Dynamic Training.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2023

Bootstrap Masked Visual Modeling via Hard Patches Mining.

[BibT_eX]

[DOI]

CoRR, 2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

ADriver-I: A General World Model for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

Language Prompt for Autonomous Driving.

[BibT_eX]

[DOI]

CoRR, 2023

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking.

[BibT_eX]

[DOI]

CoRR, 2023

Self-supervised Learning by View Synthesis.

[BibT_eX]

[DOI]

CoRR, 2023

Align-DETR: Improving DETR with Simple IoU-aware BCE loss.

[BibT_eX]

[DOI]

CoRR, 2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection.

[BibT_eX]

[DOI]

CoRR, 2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Re-parameterizing Your Optimizers rather than Architectures.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reversible Column Networks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Understanding Imbalanced Semantic Segmentation Through Neural Collapse.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors.

[BibT_eX]

[DOI]

Yuang Zhang

Tiancai Wang

Xiangyu Zhang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Differentiable Architecture Search with Random Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Referring Multi-Object Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature.

[BibT_eX]

[DOI]

Xiangwen Kong

Xiangyu Zhang

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Weight-Dependent Gates for Network Pruning.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2022

PointINS: Point-Based Instance Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Towards 3D Object Detection with 2D Supervision.

[BibT_eX]

[DOI]

CoRR, 2022

The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge.

[BibT_eX]

[DOI]

CoRR, 2022

Scaling up Kernels in 3D CNNs.

[BibT_eX]

[DOI]

CoRR, 2022

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.

[BibT_eX]

[DOI]

CoRR, 2022

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.

[BibT_eX]

[DOI]

CoRR, 2022

Self-Supervised Visual Representation Learning with Semantic Grouping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MOTR: End-to-End Multiple-Object Tracking with Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PETR: Position Embedding Transformation for Multi-view 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning.

[BibT_eX]

[DOI]

Junqiang Huang

Xiangwen Kong

Xiangyu Zhang

Proceedings of the Computer Vision - ECCV 2022, 2022

Simple Baselines for Image Restoration.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Progressive End-to-End Object Detection in Crowded Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Focal Sparse Convolutional Networks for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LGD: Label-Guided Self-Distillation for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Anchor DETR: Query Design for Transformer-Based Detector.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Joint Multi-Dimension Pruning via Numerical Gradient Update.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

On Efficient Transformer and Image Pre-training for Low-level Vision.

[BibT_eX]

[DOI]

CoRR, 2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better.

[BibT_eX]

[DOI]

Xuanyang Zhang

Xiangyu Zhang

Jian Sun

CoRR, 2021

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report.

[BibT_eX]

[DOI]

CoRR, 2021

MOTR: End-to-End Multiple-Object Tracking with TRansformer.

[BibT_eX]

[DOI]

CoRR, 2021

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Instance-Conditional Knowledge Distillation for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SOLQ: Segmenting Objects by Learning Queries.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Feature Refinement for Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Image Synthesis via Semantic Composition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Neural Architecture Search With Random Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Activate or Not: Learning Customized Activation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

RepVGG: Making VGG-Style ConvNets Great Again.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Diverse Branch Block: Building a Convolution as an Inception-Like Unit.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

You Only Look One-Level Feature.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Region-Aware Convolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Points As Queries: Weakly Semi-Supervised Object Detection by Points.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Implicit Feature Pyramid Network for Object Detection.

[BibT_eX]

[DOI]

Tiancai Wang

Xiangyu Zhang

Jian Sun

CoRR, 2020

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track.

[BibT_eX]

[DOI]

CoRR, 2020

EqCo: Equivalent Rules for Self-supervised Contrastive Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Activate or Not: Learning Customized Activation.

[BibT_eX]

[DOI]

Ningning Ma

Xiangyu Zhang

Jian Sun

CoRR, 2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay.

[BibT_eX]

[DOI]

CoRR, 2020

Joint Multi-Dimension Pruning.

[BibT_eX]

[DOI]

CoRR, 2020

Stitcher: Feedback-driven Data Provider for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2020

PointINS: Point-based Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

Rethinking Learnable Tree Filter for Generic Feature Transform.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Funnel Activation for Visual Recognition.

[BibT_eX]

[DOI]

Ningning Ma

Xiangyu Zhang

Jian Sun

Proceedings of the Computer Vision - ECCV 2020, 2020

WeightNet: Revisiting the Design Space of Weight Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Weight-Dependent Gates for Differentiable Neural Network Pruning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Angle-Based Search Space Shrinking for Neural Architecture Search.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

LabelEnc: A New Intermediate Supervision Method for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Single Path One-Shot Neural Architecture Search with Uniform Sampling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Delicate Local Representations for Multi-person Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Human-Object Interaction Detection Using Interaction Points.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attentive Normalization for Conditional Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Dynamic Routing for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Detection in Crowded Scenes: One Proposal, Multiple Predictions.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

DetNAS: Neural Architecture Search on Object Detection.

[BibT_eX]

[DOI]

CoRR, 2019

DetNAS: Backbone Search for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Objects365: A Large-Scale, High-Quality Dataset for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Bounding Box Regression With Uncertainty for Accurate Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection.

[BibT_eX]

[DOI]

CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.

[BibT_eX]

[DOI]

CoRR, 2018

CrowdHuman: A Benchmark for Detecting Human in a Crowd.

[BibT_eX]

[DOI]

CoRR, 2018

DetNet: A Backbone network for Object Detection.

[BibT_eX]

[DOI]

CoRR, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.

[BibT_eX]

[DOI]

CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

DetNet: Design Backbone for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MegDet: A Large Mini-Batch Object Detector.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017

Object Detection Networks on Convolutional Feature Maps.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2017

Light-Head R-CNN: In Defense of Two-Stage Object Detector.

[BibT_eX]

[DOI]

CoRR, 2017

Channel Pruning for Accelerating Very Deep Neural Networks.

[BibT_eX]

[DOI]

Yihui He

Xiangyu Zhang

Jian Sun

Proceedings of the IEEE International Conference on Computer Vision, 2017

Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016

Accelerating Very Deep Convolutional Networks for Classification and Detection.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2016

Identity Mappings in Deep Residual Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Deep Residual Learning for Image Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2015

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Efficient and accurate approximations of nonlinear convolutional networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014

Toward Concurrent Lock-Free Queues on GPUs.

[BibT_eX]

[DOI]

Xiangyu Zhang

Yangdong Deng

Shuai Mu

IEICE Trans. Inf. Syst., 2014

2012

Interconnection of wind farms with grid using a MTDC network.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012

Xiangyu Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...