Yi Wang

Affiliations:
  • Shanghai AI Laboratory, China


According to our database1, Yi Wang authored at least 52 papers between 2012 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2012
2014
2016
2018
2020
2022
2024
0
5
10
15
20
4
8
10
4
1
6
9
3
2
2
1
1
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
MSFM-UNET: enhancing medical image segmentation with multi-scale and multi-view frequency fusion.
Pattern Anal. Appl., March, 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling.
CoRR, January, 2025

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models.
CoRR, January, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling.
CoRR, January, 2025

2024
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.
CoRR, 2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.
CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.
CoRR, 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.
CoRR, 2024

ViLLa: Video Reasoning Segmentation with Large Language Model.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

SyncVIS: Synchronized Video Instance Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Does Video-Text Pretraining Help Open-Vocabulary Online Action Detection?
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
Conditional Temporal Variational AutoEncoder for Action Video Prediction.
Int. J. Comput. Vis., October, 2023

Open World Entity Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

JourneyDB: A Benchmark for Generative Image Understanding.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Scaling Data Generation in Vision-and-Language Navigation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Learning Open-Vocabulary Semantic Segmentation Models From Natural Language Supervision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
PointINS: Point-Based Instance Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

PalGAN: Image Colorization with Palette Generative Adversarial Networks.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Implicit Text-Guided 3D Shape Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MAT: Mask-Aware Transformer for Large Hole Image Inpainting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Image Synthesis via Semantic Composition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
VCNet: A Robust Approach to Blind Image Inpainting.
Proceedings of the Computer Vision - ECCV 2020, 2020

Attentive Normalization for Conditional Image Generation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Wide-Context Semantic Image Extrapolation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Scale-recurrent Network for Deep Image Deblurring.
CoRR, 2018

Image Inpainting via Generative Multi-column Convolutional Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2012
Interconnection of wind farms with grid using a MTDC network.
Proceedings of the 38th Annual Conference on IEEE Industrial Electronics Society, 2012


  Loading...