Han Hu

Orcid: 0000-0001-5104-6146

Affiliations:
  • Microsoft Research Asia, Beijing, China
  • Tsinghua University, Department of Automation, Tsinghua National Laboratory for Information Science and Technology, Beijing, China (PhD 2014)


According to our database1, Han Hu authored at least 111 papers between 2009 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
A Survey on Video Diffusion Models.
ACM Comput. Surv., February, 2025

2024
Expediting Large-Scale Vision Transformer for Dense Prediction Without Fine-Tuning.
IEEE Trans. Pattern Anal. Mach. Intell., January, 2024

Xwin-LM: Strong and Scalable Alignment Practice for LLMs.
CoRR, 2024

Common 7B Language Models Already Possess Strong Math Capabilities.
CoRR, 2024

Unsupervised Graphic Layout Grouping with Transformers.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

LarvSeg: Exploring Image Classification Data for Large Vocabulary Semantic Segmentation via Category-Wise Attentive Classifier.
Proceedings of the Pattern Recognition and Computer Vision - 7th Chinese Conference, 2024

Data-efficient Large Vision Models through Sequential Autoregression.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

GAIA: Zero-shot Talking Avatar Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

SimDA: Simple Diffusion Adapter for Efficient Video Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Multiple View Geometry Transformers for 3D Human Pose Estimation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Segment and Caption Anything.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2023

Global Context Networks.
IEEE Trans. Pattern Anal. Mach. Intell., June, 2023

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models.
CoRR, 2023

FP8-LM: Training FP8 Large Language Models.
CoRR, 2023

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance.
CoRR, 2023

InstructDiffusion: A Generalist Modeling Interface for Vision Tasks.
CoRR, 2023

DETR Doesn't Need Multi-Scale or Locality Design.
CoRR, 2023

GlyphControl: Glyph Conditional Control for Visual Text Generation.
CoRR, 2023

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale.
CoRR, 2023

DeepMIM: Deep Supervision for Masked Image Modeling.
CoRR, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.
CoRR, 2023

GlyphControl: Glyph Conditional Control for Visual Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Rank-DETR for High Quality Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tutel: Adaptive Mixture-of-Experts at Scale.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

ClipCrop: Conditioned Cropping Driven by Vision-Language Model.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Attentive Mask CLIP.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Improving CLIP Fine-tuning Performance.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

All in Tokens: Unifying Output Space of Visual Tasks via Soft Token.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

DETR Does Not Need Multi-Scale or Locality Design.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Mask-Attention-Free Transformer for 3D Instance Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Efficient Diffusion Training via Min-SNR Weighting Strategy.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Side Adapter Network for Open-Vocabulary Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SVFormer: Semi-supervised Video Transformer for Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

On Data Scaling in Masked Image Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Revealing the Dark Secrets of Masked Image Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-training for Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

ResFormer: Scaling ViTs with Multi-Resolution Training.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

DETRs with Hybrid Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Human Pose as Compositional Tokens.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Exploring Non-additive Randomness on ViT against Query-Based Black-Box Attacks.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022
Exploring Discrete Diffusion Models for Image Captioning.
CoRR, 2022

Could Giant Pretrained Image Models Extract Universal Representations?
CoRR, 2022

Tutel: Adaptive Mixture-of-Experts at Scale.
CoRR, 2022

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation.
CoRR, 2022

Deeper Insights into ViTs Robustness towards Common Corruptions.
CoRR, 2022

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition.
CoRR, 2022

Enhancing the Robustness, Efficiency, and Diversity of Differentiable Architecture Search.
CoRR, 2022

Region Rebalance for Long-Tailed Semantic Segmentation.
CoRR, 2022

MLSeg: Image and Video Segmentation as Multi-Label Classification and Selected-Label Pixel Classification.
CoRR, 2022

Could Giant Pre-trained Image Models Extract Universal Representations?
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Graph Hawkes Transformer for Extrapolated Reasoning on Temporal Knowledge Graphs.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model.
Proceedings of the Computer Vision - ECCV 2022, 2022

A Simple Approach and Benchmark for 21, 000-Category Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

SimMIM: a Simple Framework for Masked Image Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Video Swin Transformer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Swin Transformer V2: Scaling Up Capacity and Resolution.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
A Simple Baseline for Zero-shot Semantic Segmentation with Pre-trained Vision-language Model.
CoRR, 2021

Breaking Shortcut: Exploring Fully Convolutional Cycle-Consistency for Video Correspondence Learning.
CoRR, 2021

Self-Supervised Learning with Swin Transformers.
CoRR, 2021

Bootstrap Your Object Detector via Mixed Training.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Aligning Pretraining for Detection via Object-Level Contrastive Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Semi-Supervised Semantic Segmentation via Adaptive Equalization Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Leveraging Batch Normalization for Vision Transformers.
Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2021

End-to-End Semi-Supervised Object Detection with Soft Teacher.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Group-Free 3D Object Detection via Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Capsule Network Is Not More Robust Than Convolutional Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Boosting Adversarial Transferability through Enhanced Momentum.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

RepPoints v2: Verification Meets Regression for Object Detection.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Parametric Instance Classification for Unsupervised Visual Feature learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Disentangled Non-local Neural Networks.
Proceedings of the Computer Vision - ECCV 2020, 2020

Dense RepPoints: Representing Visual Objects with Dense Point Sets.
Proceedings of the Computer Vision - ECCV 2020, 2020

A Closer Look at Local Aggregation Operators in Point Cloud Analysis.
Proceedings of the Computer Vision - ECCV 2020, 2020

Negative Margin Matters: Understanding Margin in Few-Shot Classification.
Proceedings of the Computer Vision - ECCV 2020, 2020

Memory Enhanced Global-Local Aggregation for Video Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Deep Metric Transfer for Label Propagation with Limited Annotated Data.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, 2019

RepPoints: Point Set Representation for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Spatial-Temporal Relation Networks for Multi-Object Tracking.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Local Relation Networks for Image Recognition.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Deformable ConvNets V2: More Deformable, Better Results.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Learning Region Features for Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Relation Networks for Object Detection.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
Deformable Convolutional Networks.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Depth Estimation Using a Sliding Camera.
IEEE Trans. Image Process., 2016

2015
Exploiting Unsupervised and Supervised Constraints for Subspace Clustering.
IEEE Trans. Pattern Anal. Mach. Intell., 2015

Progressive feature matching via triplet graph.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

2014
Smooth Representation Clustering.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014

2013
Multi-Class Constrained Normalized Cut With Hard, Soft, Unary and Pairwise Priors and its Applications to Object Segmentation.
IEEE Trans. Image Process., 2013

2012
Multi-way constrained spectral clustering by nonnegative restriction.
Proceedings of the 21st International Conference on Pattern Recognition, 2012

2011
Video Stabilization and Completion Using Two Cameras.
IEEE Trans. Circuits Syst. Video Technol., 2011

2010
HTF: a novel feature for general crack detection.
Proceedings of the International Conference on Image Processing, 2010

Trajectory matching from unsynchronized videos.
Proceedings of the Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, 2010

2009
Multiframe Motion Segmentation via Penalized MAP Estimation and Linear Programming.
Proceedings of the British Machine Vision Conference, 2009


  Loading...