Yuanhan Zhang

Orcid: 0000-0002-9063-7886

According to our database1, Yuanhan Zhang authored at least 30 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Video Instruction Tuning With Synthetic Data.
CoRR, 2024

LLaVA-OneVision: Easy Visual Task Transfer.
CoRR, 2024

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models.
CoRR, 2024

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models.
CoRR, 2024

Long Context Transfer from Language to Vision.
CoRR, 2024

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning.
CoRR, 2024

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward.
CoRR, 2024

3D Point Cloud Pre-Training with Knowledge Distilled from 2D Images.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

Exploring the Integration of Light and Music in Artistic Furniture Design: A Study in Interaction Design Informed by Children's Climbing Behavior.
Proceedings of the Human-Computer Interaction, 2024

Octopus: Embodied Vision-Language Programmer from Environmental Feedback.
Proceedings of the Computer Vision - ECCV 2024, 2024

[inline-graphic not available: see fulltext] FunQA: Towards Surprising Video Comprehension.
Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?
Proceedings of the Computer Vision - ECCV 2024, 2024

VBench: Comprehensive Benchmark Suite for Video Generative Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
OtterHD: A High-Resolution Multi-modality Model.
CoRR, 2023

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images.
CoRR, 2023

FunQA: Towards Surprising Video Comprehension.
CoRR, 2023

MIMIC-IT: Multi-Modal In-Context Instruction Tuning.
CoRR, 2023

Learning without Forgetting for Vision-Language Models.
CoRR, 2023

Latent Distribution Adjusting for Face Anti-Spoofing.
CoRR, 2023

Otter: A Multi-Modal Model with In-Context Instruction Tuning.
CoRR, 2023

What Makes Good Examples for Visual In-Context Learning?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images.
CoRR, 2022

On-Device Domain Generalization.
CoRR, 2022

Neural Prompt Search.
CoRR, 2022

Robust Face Anti-Spoofing with Dual Probabilistic Modeling.
CoRR, 2022

Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy.
CoRR, 2022

Benchmarking Omni-Vision Representation Through the Lens of Visual Realms.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
CelebA-Spoof Challenge 2020 on Face Anti-Spoofing: Methods and Results.
CoRR, 2021

2020
CelebA-Spoof: Large-Scale Face Anti-spoofing Dataset with Rich Annotations.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Makeup based on segmentation and local transfer.
Proceedings of the 6th International Conference on Behavioral, 2019


  Loading...