Yali Wang

Orcid: 0000-0001-8415-790X

Affiliations:

Chinese Academy of Sciences, Shenzhen Institute of Advanced Technology, Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, China

According to our database¹, Yali Wang authored at least 93 papers between 2016 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

Percept, Chat, Adapt: Knowledge transfer of foundation models for open-world video recognition.

[BibT_eX]

[DOI]

Pattern Recognit., 2025

2024

F2S-Net: learning frame-to-segment prediction for online action detection.

[BibT_eX]

[DOI]

Yi Liu

Yu Qiao

Yali Wang

J. Real Time Image Process., May, 2024

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Attentive Snippet Prompting for Video Retrieval.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2024

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment.

[BibT_eX]

[DOI]

CoRR, 2024

CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel.

[BibT_eX]

[DOI]

CoRR, 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration.

[BibT_eX]

[DOI]

CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.

[BibT_eX]

[DOI]

CoRR, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Vlogger: Make Your Dream A Vlog.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

M-BEV: Masked BEV Perception for Robust Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Hybrid token transformer for deep face recognition.

[BibT_eX]

[DOI]

Pattern Recognit., July, 2023

Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset.

[BibT_eX]

[DOI]

Comput. Vis. Media, February, 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.

[BibT_eX]

[DOI]

CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

VideoChat: Chat-Centric Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.

[BibT_eX]

[DOI]

CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

CoRR, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Action Recognition With Motion Diversification and Dynamic Selection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.

[BibT_eX]

[DOI]

CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.

[BibT_eX]

[DOI]

CoRR, 2022

Low-Resolution Action Recognition for Tiny Actions Challenge.

[BibT_eX]

[DOI]

Boyu Chen

Yu Qiao

Yali Wang

CoRR, 2022

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.

[BibT_eX]

[DOI]

CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Visual Knowledge Graph for Human Action Reasoning in Videos.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Pattern Recognition, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-slimmed Vision Transformer.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Wildfish++: A Comprehensive Fish Benchmark for Multimedia Research.

[BibT_eX]

[DOI]

Peiqin Zhuang

Yali Wang

Yu Qiao

IEEE Trans. Multim., 2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.

[BibT_eX]

[DOI]

Francisco Gómez Fernández

Qinlong Wang

Yang Yang

CoRR, 2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.

[BibT_eX]

[DOI]

CoRR, 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.

[BibT_eX]

[DOI]

CoRR, 2021

CT-Net: Channel Tensorization Network for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Digging into Uncertainty in Self-supervised Multi-view Stereo.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Progressive Object Transfer Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

DID: Disentangling-Imprinting-Distilling for Continuous Low-Shot Detection.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2020

Finding hard faces with better proposals and classifier.

[BibT_eX]

[DOI]

Mach. Vis. Appl., 2020

Mining Inter-Video Proposal Relations for Video Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Attentive Pairwise Interaction for Fine-Grained Classification.

[BibT_eX]

[DOI]

Peiqin Zhuang

Yali Wang

Yu Qiao

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Context-Transformer: Tackling Object Confusion for Few-Shot Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Dual-supervised attention network for deep cross-modal hashing.

[BibT_eX]

[DOI]

Pattern Recognit. Lett., 2019

MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition.

[BibT_eX]

[DOI]

Weihe Zhang

Yali Wang

Yu Qiao

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PA3D: Pose-Action 3D Machine for Video Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adaptive Pyramid Context Network for Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.

[BibT_eX]

[DOI]

Wenbin Du

Yali Wang

Yu Qiao

IEEE Trans. Image Process., 2018

WildFish: A Large Benchmark for Fish Recognition in the Wild.

[BibT_eX]

[DOI]

Peiqin Zhuang

Yali Wang

Yu Qiao

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Temporal Hallucinating for Action Recognition With Few Still Images.

[BibT_eX]

[DOI]

Yali Wang

Lei Zhou

Yu Qiao

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

LSTD: A Low-Shot Transfer Detector for Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Image Process., 2017

Bayesian inference for time-varying applications: Particle-based Gaussian process approaches.

[BibT_eX]

[DOI]

Yali Wang

Brahim Chaib-draa

Neurocomputing, 2017

An online Bayesian filtering framework for Gaussian process regression: Application to global surface temperature analysis.

[BibT_eX]

[DOI]

Yali Wang

Brahim Chaib-draa

Expert Syst. Appl., 2017

RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos.

[BibT_eX]

[DOI]

Wenbin Du

Yali Wang

Yu Qiao

Proceedings of the IEEE International Conference on Computer Vision, 2017

Sparse Deep Transfer Learning for Convolutional Neural Network.

[BibT_eX]

[DOI]

Jiaming Liu

Yali Wang

Yu Qiao

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016

KNN-based Kalman filter: An efficient and non-stationary method for Gaussian process regression.

[BibT_eX]

[DOI]

Yali Wang

Brahim Chaib-draa

Knowl. Based Syst., 2016

Codebook enhancement of vlad representation for visual recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Human action recognition with DeepAction Kernel Gaussian Process.

[BibT_eX]

[DOI]

Yali Wang

Lin Li

Yu Qiao

Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics, 2016

Yali Wang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...