Siyu Zhu

Orcid: 0000-0003-0293-0044

Affiliations:
  • Alibaba Group, A. I. Labs, Hangzhou, China
  • Hong Kong University of Science and Engineering, Department of Computer Science and Engineering, Hong Kong (PhD 2017)


According to our database1, Siyu Zhu authored at least 64 papers between 2014 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model.
Pattern Recognit., 2025

Text-video retrieval re-ranking via multi-grained cross attention and frozen image encoders.
Pattern Recognit., 2025

2024
Open-Vocabulary Category-Level Object Pose and Size Estimation.
IEEE Robotics Autom. Lett., September, 2024

Towards Native Generative Model for 3D Head Avatar.
CoRR, 2024

4D Diffusion for Dynamic Protein Structure Prediction with Reference Guided Motion Alignment.
CoRR, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°.
CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.
CoRR, 2024

OV9D: Open-Vocabulary Category-Level 9D Object Pose and Size Estimation.
CoRR, 2024

VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model.
CoRR, 2024

Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance.
Proceedings of the Computer Vision - ECCV 2024, 2024

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians.
Proceedings of the Computer Vision - ECCV 2024, 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360$^\circ $.
Proceedings of the Computer Vision - ECCV 2024, 2024

Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition.
Proceedings of the Computer Vision - ECCV 2024, 2024

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head.
Proceedings of the Computer Vision - ECCV 2024, 2024

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
DRO: Deep Recurrent Optimizer for Video to Depth.
IEEE Robotics Autom. Lett., May, 2023

Fine-Grained Open Domain Image Animation with Motion Guidance.
CoRR, 2023

Fine-grained Text-Video Retrieval with Frozen Image Encoders.
CoRR, 2023

UVOSAM: A Mask-free Paradigm for Unsupervised Video Object Segmentation via Segment Anything Model.
CoRR, 2023

Towards Robust Video Instance Segmentation with Temporal-Aware Transformer.
CoRR, 2023

Learning Aligned Cross-modal Representations for Referring Image Segmentation.
CoRR, 2023

Monocular Scene Reconstruction with 3D SDF Transformers.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments.
CoRR, 2022

RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds.
CoRR, 2022

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation.
CoRR, 2022

Quadtree Attention for Vision Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Neural Window Fully-connected CRFs for Monocular Depth Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

RCP: Recurrent Closest Point for Point Cloud.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cluster Contrast for Unsupervised Person Re-identification.
Proceedings of the Computer Vision - ACCV 2022, 2022

GB-CosFace: Rethinking Softmax-Based Face Recognition from the Perspective of Open Set Classification.
Proceedings of the Computer Vision - ACCV 2022, 2022

2021
UniFuse: Unidirectional Fusion for 360° Panorama Depth Estimation.
IEEE Robotics Autom. Lett., 2021

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification.
CoRR, 2021

AR Mapping: Accurate and Efficient Mapping for Augmented Reality.
CoRR, 2021

Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment.
CoRR, 2021

DRO: Deep Recurrent Optimizer for Structure-from-Motion.
CoRR, 2021

UniFuse: Unidirectional Fusion for 360<sup>°</sup> Panorama Depth Estimation.
CoRR, 2021

Stereo Matching by Self-supervision of Multiscopic Vision.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

Single-Shot is Enough: Panoramic Infrastructure Based Calibration of Multiple Cameras and 3D LiDARs.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021

CondLaneNet: a Top-to-down Lane Detection Framework Based on Conditional Convolution.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

FloorPlanCAD: A Large-Scale CAD Drawing Dataset for Panoptic Symbol Spotting.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Camera Localization via Dense Scene Matching.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

MeshMVS: Multi-View Stereo Guided Mesh Reconstruction.
Proceedings of the International Conference on 3D Vision, 2021

2020
Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Self-Supervised Human Depth Estimation From Monocular Videos.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

End-to-End Learning Local Multi-View Descriptors for 3D Point Clouds.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
A Neural Network for Detailed Human Depth Estimation From a Single Image.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Batch DropBlock Network for Person Re-Identification and Beyond.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Batch Feature Erasing for Person Re-identification and Beyond.
CoRR, 2018

Learning and Matching Multi-View Descriptors for Registration of Point Clouds.
Proceedings of the Computer Vision - ECCV 2018, 2018

GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints.
Proceedings of the Computer Vision - ECCV 2018, 2018

Very Large-Scale Global SfM by Distributed Motion Averaging.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Matchable Image Retrieval by Learning from Surface Reconstruction.
Proceedings of the Computer Vision - ACCV 2018, 2018

2017
Accurate, Scalable and Parallel Structure from Motion.
CoRR, 2017

Progressive Large Scale-Invariant Image Matching in Scale Space.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Relative Camera Refinement for Accurate Dense Reconstruction.
Proceedings of the 2017 International Conference on 3D Vision, 2017

2016
Image-Based Building Regularization Using Structural Linear Features.
IEEE Trans. Vis. Comput. Graph., 2016

Graph-Based Consistent Matching for Structure-from-Motion.
Proceedings of the Computer Vision - ECCV 2016, 2016

Color Correction for Image-Based Modeling in the Large.
Proceedings of the Computer Vision - ACCV 2016, 2016

2015
Joint Camera Clustering and Surface Segmentation for Large-Scale Multi-view Stereo.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015

2014
Local Readjustment for High-Resolution 3D Reconstruction.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

Multi-view Geometry Compression.
Proceedings of the Computer Vision - ACCV 2014, 2014

Multi-scale Tetrahedral Fusion of a Similarity Reconstruction and Noisy Positional Measurements.
Proceedings of the Computer Vision - ACCV 2014, 2014


  Loading...