Zhaoxin Fan

Orcid: 0000-0002-6324-1712

According to our database¹, Zhaoxin Fan authored at least 60 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding.

[BibT_eX]

[DOI]

CoRR, January, 2025

Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024

MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Instrum. Meas., 2024

A novel transformer autoencoder for multi-modal emotion recognition with incomplete data.

[BibT_eX]

[DOI]

Neural Networks, 2024

EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers.

[BibT_eX]

[DOI]

CoRR, 2024

MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing.

[BibT_eX]

[DOI]

CoRR, 2024

Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images.

[BibT_eX]

[DOI]

CoRR, 2024

CoheDancers: Enhancing Interactive Group Dance Generation through Music-Driven Coherence Decomposition.

[BibT_eX]

[DOI]

CoRR, 2024

Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation.

[BibT_eX]

[DOI]

CoRR, 2024

Moderating the Generalization of Score-based Generative Model.

[BibT_eX]

[DOI]

CoRR, 2024

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction.

[BibT_eX]

[DOI]

CoRR, 2024

LaDTalk: Latent Denoising for Synthesizing Talking Head Videos with High Frequency Details.

[BibT_eX]

[DOI]

CoRR, 2024

VGG-Tex: A Vivid Geometry-Guided Facial Texture Estimation Model for High Fidelity Monocular 3D Face Reconstruction.

[BibT_eX]

[DOI]

CoRR, 2024

Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation.

[BibT_eX]

[DOI]

CoRR, 2024

GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer.

[BibT_eX]

[DOI]

CoRR, 2024

MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling.

[BibT_eX]

[DOI]

CoRR, 2024

A Comprehensive Taxonomy and Analysis of Talking Head Synthesis: Techniques for Portrait Generation, Driving Mechanisms, and Editing.

[BibT_eX]

[DOI]

CoRR, 2024

Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs.

[BibT_eX]

[DOI]

CoRR, 2024

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail.

[BibT_eX]

[DOI]

CoRR, 2024

AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration.

[BibT_eX]

[DOI]

CoRR, 2024

Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-dimensional Fusion and Consistency for Semi-supervised Medical Image Segmentation.

[BibT_eX]

[DOI]

Yixing Lu

Zhaoxin Fan

Min Xu

Proceedings of the MultiMedia Modeling - 30th International Conference, 2024

STDG: Semi-Teacher-Student Training Paradigm for Depth-guided One-stage Scene Graph Generation.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

CoDancers: Music-Driven Coherent Group Dance Generation with Choreographic Unit.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

PoseRec: 3D Human Pose Driven Online Advertisement Recommendation for Micro-videos.

[BibT_eX]

[DOI]

Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024

ESTGN: Enhanced Self-Mined Text Guided Super-Resolution Network for Superior Image Super Resolution.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MLPHand: Real Time Multi-view 3D Hand Reconstruction via MLP Modeling.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Everything2Motion: Synchronizing Diverse Inputs via a Unified Framework for Human Motion Synthesis.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Deep semantic-aware remote sensing image deblurring.

[BibT_eX]

[DOI]

Signal Process., October, 2023

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2023

STDG: Semi-Teacher-Student Training Paradigram for Depth-guided One-stage Scene Graph Generation.

[BibT_eX]

[DOI]

CoRR, 2023

Benchmarking Ultra-High-Definition Image Reflection Removal.

[BibT_eX]

[DOI]

CoRR, 2023

DenseMP: Unsupervised Dense Pre-training for Few-shot Medical Image Segmentation.

[BibT_eX]

[DOI]

CoRR, 2023

SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking Faces.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

Reconstruction-Aware Prior Distillation for Semi-supervised Point Cloud Completion.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Automation, 2023

D-IF: Uncertainty-aware Human Digitization via Implicit Distribution Field.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EmoTalk: Speech-Driven Emotional Disentanglement for 3D Face Animation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Robust Single Image Reflection Removal Against Adversarial Attacks.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation.

[BibT_eX]

[DOI]

CoRR, 2022

FuRPE: Learning Full-body Reconstruction from Part Experts.

[BibT_eX]

[DOI]

CoRR, 2022

Human Pose Driven Object Effects Recommendation.

[BibT_eX]

[DOI]

CoRR, 2022

MonoPCNS: Monocular 3D Object Detection via Point Cloud Network Simulation.

[BibT_eX]

[DOI]

CoRR, 2022

PilotAttnNet: Multi-modal Attention Network for End-to-End Steering Control.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition and Computer Vision - 5th Chinese Conference, 2022

Unsupervised Multi-Task Learning for 3D Subtomogram Image Alignment, Clustering and Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Image Processing, 2022

RPR-Net: A Point Cloud-Based Rotation-Aware Large Scale Place Recognition Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022 Workshops, 2022

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation.

[BibT_eX]

[DOI]

CoRR, 2021

Attentive Rotation Invariant Convolution for Point Cloud-based Large Scale Place Recognition.

[BibT_eX]

[DOI]

CoRR, 2021

SVT-Net: A Super Light-Weight Network for Large Scale Place Recognition using Sparse Voxel Transformers.

[BibT_eX]

[DOI]

CoRR, 2021

MPDNet: A 3D Missing Part Detection Network Based on Point Cloud Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

A Graph-based One-Shot Learning Method for Point Cloud Recognition.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

SRNet: A 3D Scene Recognition Network using Static Graph and Dense Semantic Fusion.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2020

DAGC: Employing Dual Attention and Graph Convolution for Point Cloud based Place Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 on International Conference on Multimedia Retrieval, 2020

PointFPN: A Frustum-based Feature Pyramid Network for 3D Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 32nd IEEE International Conference on Tools with Artificial Intelligence, 2020

2016

A Text Clustering Approach of Chinese News Based on Neural Network Language Model.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2016

Zhaoxin Fan

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...