Yi Wang

Affiliations:

Shanghai AI Laboratory, China

According to our database¹, Yi Wang authored at least 52 papers between 2012 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2012

2014

2016

2018

2020

2022

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion.

[BibT_eX]

[DOI]

Pattern Anal. Appl., March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.

[BibT_eX]

[DOI]

CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.

[BibT_eX]

[DOI]

CoRR, January, 2025

2024

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

SyncVIS: Synchronized Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

Conditional Temporal Variational AutoEncoder for Action Video Prediction.

[BibT_eX]

[DOI]

Int. J. Comput. Vis., October, 2023

Open World Entity Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoChat: Chat-Centric Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

PointINS: Point-Based Instance Segmentation.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Implicit Text-Guided 3D Shape Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Image Synthesis via Semantic Composition.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

VCNet: A Robust Approach to Blind Image Inpainting.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Attentive Normalization for Conditional Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Wide-Context Semantic Image Extrapolation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Scale-recurrent Network for Deep Image Deblurring.

[BibT_eX]

[DOI]

CoRR, 2018

Image Inpainting via Generative Multi-column Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2012

Interconnection of wind farms with grid using a MTDC network.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012

Yi Wang

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...