Siyu Zhu

Orcid: 0000-0003-0293-0044

Affiliations:

Alibaba Group, A. I. Labs, Hangzhou, China
Hong Kong University of Science and Engineering, Department of Computer Science and Engineering, Hong Kong (PhD 2017)

According to our database¹, Siyu Zhu authored at least 64 papers between 2014 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

2024

Open-Vocabulary Category-Level Object Pose and Size Estimation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., September, 2024

Towards Native Generative Model for 3D Head Avatar.

[BibT_eX]

[DOI]

CoRR, 2024

4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°.

[BibT_eX]

[DOI]

CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.

[BibT_eX]

[DOI]

CoRR, 2024

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation.

[BibT_eX]

[DOI]

CoRR, 2024

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model.

[BibT_eX]

[DOI]

CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360$^\circ $.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

DRO: Deep Recurrent Optimizer for Video to Depth.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., May, 2023

Fine-Grained Open Domain Image Animation with Motion Guidance.

[BibT_eX]

[DOI]

CoRR, 2023

Fine-grained Text-Video Retrieval with Frozen Image Encoders.

[BibT_eX]

[DOI]

CoRR, 2023

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Aligned Cross-modal Representations for Referring Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Monocular Scene Reconstruction with 3D SDF Transformers.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments.

[BibT_eX]

[DOI]

CoRR, 2022

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds.

[BibT_eX]

[DOI]

CoRR, 2022

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

Quadtree Attention for Vision Transformers.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Neural Window Fully-connected CRFs for Monocular Depth Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RCP: Recurrent Closest Point for Point Cloud.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cluster Contrast for Unsupervised Person Re-identification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

GB-CosFace: Rethinking Softmax-Based Face Recognition from the Perspective of Open Set Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2022, 2022

2021

UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation.

[BibT_eX]

[DOI]

IEEE Robotics Autom. Lett., 2021

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification.

[BibT_eX]

[DOI]

CoRR, 2021

AR Mapping: Accurate and Efficient Mapping for Augmented Reality.

[BibT_eX]

[DOI]

CoRR, 2021

Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment.

[BibT_eX]

[DOI]

CoRR, 2021

DRO: Deep Recurrent Optimizer for Structure-from-Motion.

[BibT_eX]

[DOI]

CoRR, 2021

UniFuse: Unidirectional Fusion for 360<sup>°</sup> Panorama Depth Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Stereo Matching by Self-supervision of Multiscopic Vision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Single-Shot is Enough: Panoramic Infrastructure Based Calibration of Multiple Cameras and 3D LiDARs.

[BibT_eX]

[DOI]

Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Camera Localization via Dense Scene Matching.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

MeshMVS: Multi-View Stereo Guided Mesh Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the International Conference on 3D Vision, 2021

2020

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2020

Self-Supervised Human Depth Estimation From Monocular Videos.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

A Neural Network for Detailed Human Depth Estimation From a Single Image.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Batch DropBlock Network for Person Re-Identification and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018

Batch Feature Erasing for Person Re-identification and Beyond.

[BibT_eX]

[DOI]

CoRR, 2018

Learning and Matching Multi-View Descriptors for Registration of Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2018, 2018

Very Large-Scale Global SfM by Distributed Motion Averaging.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Matchable Image Retrieval by Learning from Surface Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2018, 2018

2017

Accurate, Scalable and Parallel Structure from Motion.

[BibT_eX]

[DOI]

CoRR, 2017

Progressive Large Scale-Invariant Image Matching in Scale Space.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

Relative Camera Refinement for Accurate Dense Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on 3D Vision, 2017

2016

Image-Based Building Regularization Using Structural Linear Features.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2016

Graph-Based Consistent Matching for Structure-from-Motion.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2016, 2016

Color Correction for Image-Based Modeling in the Large.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2016, 2016

2015

Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014

Local Readjustment for High-Resolution 3D Reconstruction.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Multi-view Geometry Compression.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2014, 2014

Multi-scale Tetrahedral Fusion of a Similarity Reconstruction and Noisy Positional Measurements.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2014, 2014

Siyu Zhu

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...