Kunchang Li

Orcid: 0000-0001-5612-0341

Affiliations:

Chinese Academy of Sciences, Shenzhen Institute of Advanced Technology, SIAT, Shenzhen, China
University of Chinese Academy of Sciences, Beijing, China

According to our database¹, Kunchang Li authored at least 45 papers between 2021 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

2024

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

CoRR, 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Vlogger: Make Your Dream A Vlog.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

Hybrid token transformer for deep face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., July, 2023

A Progressive Difference Method for Capturing Visual Tempos on Action Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. Video Technol., March, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

VideoChat: Chat-Centric Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification.

[BibT_eX]

[DOI]

CoRR, 2022

MVP: Robust Multi-View Practice for Driving Action Localization.

[BibT_eX]

[DOI]

CoRR, 2022

Illumination Adaptive Transformer.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Pose-guided Generative Adversarial Net for Novel View Action Synthesis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

MVP: Robust Multi-View Practice for Driving Action Localization.

[BibT_eX]

[DOI]

Proceedings of the 5th IEEE International Conference on Information Systems and Computer Aided Education, 2022

Self-slimmed Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Tip-Adapter: Training-Free Adaption of CLIP for Few-Shot Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PointCLIP: Point Cloud Understanding by CLIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction.

[BibT_eX]

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.

[BibT_eX]

[DOI]

CoRR, 2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

CT-Net: Channel Tensorization Network for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

End-to-End Object Detection with Adaptive Clustering Transformer.

[BibT_eX]

[DOI]

Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Kunchang Li

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...