Yuhang Zang

Orcid: 0000-0003-1110-5062

According to our database¹, Yuhang Zang authored at least 36 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.

[BibT_eX]

[DOI]

CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation.

[BibT_eX]

[DOI]

CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.

[BibT_eX]

[DOI]

CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.

[BibT_eX]

[DOI]

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.

[BibT_eX]

[DOI]

CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

Streaming Long Video Understanding with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo.

[BibT_eX]

[DOI]

CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever you Want.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Semi-Supervised and Long-Tailed Object Detection with CascadeMatch.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., April, 2023

Contextual Object Detection with Multimodal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Unified Vision and Language Prompt Learning.

[BibT_eX]

[DOI]

CoRR, 2022

On-Device Domain Generalization.

[BibT_eX]

[DOI]

CoRR, 2022

Open-Vocabulary DETR with Conditional Matching.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Yuhang Zang

Chen Huang

Chen Change Loy

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation.

[BibT_eX]

[DOI]

CoRR, 2020

KPNet: Towards Minimal Face Detector.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Scene Text Detection with Supervised Pyramid Context Network.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Yuhang Zang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...