Xiangyu Zhang

Orcid: 0000-0003-2138-4608

Affiliations:
  • Megvii Inc., Beijing, China
  • Xi'an Jiaotong University, Department of Electrical Engineering, China (PhD 2017)
  • Microsoft Research Asia, China (former)


According to our database1, Xiangyu Zhang authored at least 136 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
GroupLane: End-to-End 3D Lane Detection With Channel-Wise Grouping.
IEEE Robotics Autom. Lett., November, 2024

Exploring Recurrent Long-Term Temporal Fusion for Multi-View 3D Perception.
IEEE Robotics Autom. Lett., July, 2024

Reconstructive Visual Instruction Tuning.
CoRR, 2024

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model.
CoRR, 2024

Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving.
CoRR, 2024

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation.
CoRR, 2024

Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?
CoRR, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding.
CoRR, 2024

SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.
CoRR, 2024

Small Language Model Meets with Reinforced Vision Vocabulary.
CoRR, 2024

Stream Query Denoising for Vectorized HD Map Construction.
CoRR, 2024

Self-Supervised Visual Preference Alignment.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

OneChart: Purify the Chart Structural Extraction via One Auxiliary Token.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

DreamLLM: Synergistic Multimodal Comprehension and Creation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Merlin: Empowering Multimodal LLMs with Foresight Minds.
Proceedings of the Computer Vision - ECCV 2024, 2024

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Compound Text-Guided Prompt Tuning via Image-Adaptive Cues.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Far3D: Expanding the Horizon for Surround-View 3D Object Detection.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DDAE: Towards Deep Dynamic Vision BERT Pretraining.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Scale-Aware Automatic Augmentations for Object Detection With Dynamic Training.
IEEE Trans. Pattern Anal. Mach. Intell., 2023

Bootstrap Masked Visual Modeling via Hard Patches Mining.
CoRR, 2023

Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models.
CoRR, 2023

ADriver-I: A General World Model for Autonomous Driving.
CoRR, 2023

Language Prompt for Autonomous Driving.
CoRR, 2023

MOTRv3: Release-Fetch Supervision for End-to-End Multi-Object Tracking.
CoRR, 2023

Self-supervised Learning by View Synthesis.
CoRR, 2023

Align-DETR: Improving DETR with Simple IoU-aware BCE loss.
CoRR, 2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection.
CoRR, 2023

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining.
Proceedings of the International Conference on Machine Learning, 2023

Re-parameterizing Your Optimizers rather than Architectures.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Reversible Column Networks.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

SCSC: Spatial Cross-scale Convolution Module to Strengthen both CNNs and Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Cross Modal Transformer: Towards Fast and Robust 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Understanding Imbalanced Semantic Segmentation Through Neural Collapse.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Differentiable Architecture Search with Random Features.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Referring Multi-Object Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Understanding Masked Image Modeling via Learning Occlusion Invariant Feature.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Weight-Dependent Gates for Network Pruning.
IEEE Trans. Circuits Syst. Video Technol., 2022

PointINS: Point-Based Instance Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Towards 3D Object Detection with 2D Supervision.
CoRR, 2022

The 1st-place Solution for ECCV 2022 Multiple People Tracking in Group Dance Challenge.
CoRR, 2022

Scaling up Kernels in 3D CNNs.
CoRR, 2022

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images.
CoRR, 2022

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs.
CoRR, 2022

Self-Supervised Visual Representation Learning with Semantic Grouping.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

MOTR: End-to-End Multiple-Object Tracking with Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

PETR: Position Embedding Transformation for Multi-view 3D Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

Revisiting the Critical Factors of Augmentation-Invariant Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Simple Baselines for Image Restoration.
Proceedings of the Computer Vision - ECCV 2022, 2022

Progressive End-to-End Object Detection in Crowded Scenes.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

Tree Energy Loss: Towards Sparsely Annotated Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Relieving Long-tailed Instance Segmentation via Pairwise Class Balance.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Scaling Up Your Kernels to 31×31: Revisiting Large Kernel Design in CNNs.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Focal Sparse Convolutional Networks for 3D Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

LGD: Label-Guided Self-Distillation for Object Detection.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Anchor DETR: Query Design for Transformer-Based Detector.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Joint Multi-Dimension Pruning via Numerical Gradient Update.
IEEE Trans. Image Process., 2021

On Efficient Transformer and Image Pre-training for Low-level Vision.
CoRR, 2021

Partial to Whole Knowledge Distillation: Progressive Distilling Decomposed Knowledge Boosts Student Better.
CoRR, 2021

Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report.
CoRR, 2021

MOTR: End-to-End Multiple-Object Tracking with TRansformer.
CoRR, 2021

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition.
CoRR, 2021

Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Instance-Conditional Knowledge Distillation for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

SOLQ: Segmenting Objects by Learning Queries.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Feature Refinement for Instance Segmentation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Image Synthesis via Semantic Composition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Neural Architecture Search With Random Labels.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Activate or Not: Learning Customized Activation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

RepVGG: Making VGG-Style ConvNets Great Again.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Diverse Branch Block: Building a Convolution as an Inception-Like Unit.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

You Only Look One-Level Feature.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dynamic Region-Aware Convolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Points As Queries: Weakly Semi-Supervised Object Detection by Points.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Co-mining: Self-Supervised Learning for Sparsely Annotated Object Detection.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Implicit Feature Pyramid Network for Object Detection.
CoRR, 2020

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track.
CoRR, 2020

EqCo: Equivalent Rules for Self-supervised Contrastive Learning.
CoRR, 2020

Activate or Not: Learning Customized Activation.
CoRR, 2020

Spherical Motion Dynamics of Deep Neural Networks with Batch Normalization and Weight Decay.
CoRR, 2020

Joint Multi-Dimension Pruning.
CoRR, 2020

Stitcher: Feedback-driven Data Provider for Object Detection.
CoRR, 2020

PointINS: Point-based Instance Segmentation.
CoRR, 2020

Rethinking Learnable Tree Filter for Generic Feature Transform.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization.
Proceedings of the 8th International Conference on Learning Representations, 2020

Funnel Activation for Visual Recognition.
Proceedings of the Computer Vision - ECCV 2020, 2020

WeightNet: Revisiting the Design Space of Weight Networks.
Proceedings of the Computer Vision - ECCV 2020, 2020

Weight-Dependent Gates for Differentiable Neural Network Pruning.
Proceedings of the Computer Vision - ECCV 2020 Workshops, 2020

Angle-Based Search Space Shrinking for Neural Architecture Search.
Proceedings of the Computer Vision - ECCV 2020, 2020

LabelEnc: A New Intermediate Supervision Method for Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Single Path One-Shot Neural Architecture Search with Uniform Sampling.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Delicate Local Representations for Multi-person Pose Estimation.
Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Human-Object Interaction Detection Using Interaction Points.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Attentive Normalization for Conditional Image Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Dynamic Routing for Semantic Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Detection in Crowded Scenes: One Proposal, Multiple Predictions.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
DetNAS: Neural Architecture Search on Object Detection.
CoRR, 2019

DetNAS: Backbone Search for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Objects365: A Large-Scale, High-Quality Dataset for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Meta-SR: A Magnification-Arbitrary Network for Super-Resolution.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Bounding Box Regression With Uncertainty for Accurate Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection.
CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.
CoRR, 2018

CrowdHuman: A Benchmark for Detecting Human in a Crowd.
CoRR, 2018

DetNet: A Backbone network for Object Detection.
CoRR, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.
CoRR, 2018

MetaAnchor: Learning to Detect Objects with Customized Anchors.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

ExFuse: Enhancing Feature Fusion for Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design.
Proceedings of the Computer Vision - ECCV 2018, 2018

DetNet: Design Backbone for Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MegDet: A Large Mini-Batch Object Detector.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Object Detection Networks on Convolutional Feature Maps.
IEEE Trans. Pattern Anal. Mach. Intell., 2017

Light-Head R-CNN: In Defense of Two-Stage Object Detector.
CoRR, 2017

Channel Pruning for Accelerating Very Deep Neural Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Accelerating Very Deep Convolutional Networks for Classification and Detection.
IEEE Trans. Pattern Anal. Mach. Intell., 2016

Identity Mappings in Deep Residual Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Deep Residual Learning for Image Recognition.
Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016

2015
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

Efficient and accurate approximations of nonlinear convolutional networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Toward Concurrent Lock-Free Queues on GPUs.
IEICE Trans. Inf. Syst., 2014

2012
Interconnection of wind farms with grid using a MTDC network.
Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012


  Loading...