Yuhang Zang

Orcid: 0000-0003-1110-5062

According to our database1, Yuhang Zang authored at least 36 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation.
CoRR, 2024

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations.
CoRR, 2024

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs.
CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024

MotionClone: Training-Free Motion Cloning for Controllable Video Generation.
CoRR, 2024

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions.
CoRR, 2024

Bootstrap3D: Improving 3D Content Creation with Synthetic Data.
CoRR, 2024

Streaming Long Video Understanding with Large Language Models.
CoRR, 2024

Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo.
CoRR, 2024

Unified Scene Representation and Reconstruction for 3D Large Language Models.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

Are We on the Right Way for Evaluating Large Vision-Language Models?
CoRR, 2024

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

VLMEvalKit: An Open-Source ToolKit for Evaluating Large Multi-Modality Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Long-CLIP: Unlocking the Long-Text Capability of CLIP.
Proceedings of the Computer Vision - ECCV 2024, 2024

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo.
Proceedings of the Computer Vision - ECCV 2024, 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever you Want.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Semi-Supervised and Long-Tailed Object Detection with CascadeMatch.
Int. J. Comput. Vis., April, 2023

Contextual Object Detection with Multimodal Large Language Models.
CoRR, 2023

2022
Unified Vision and Language Prompt Learning.
CoRR, 2022

On-Device Domain Generalization.
CoRR, 2022

Open-Vocabulary DETR with Conditional Matching.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
1st Place Solutions for OpenImage2019 - Object Detection and Instance Segmentation.
CoRR, 2020

KPNet: Towards Minimal Face Detector.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Scene Text Detection with Supervised Pyramid Context Network.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019


  Loading...