Yali Wang

Orcid: 0000-0001-8415-790X

Affiliations:
  • Chinese Academy of Sciences, Shenzhen Institute of Advanced Technology, Guangdong-Hong Kong-Macao Joint Laboratory of Human-Machine Intelligence-Synergy Systems, China


According to our database1, Yali Wang authored at least 88 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
F2S-Net: learning frame-to-segment prediction for online action detection.
J. Real Time Image Process., May, 2024

CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud Learning.
IEEE Trans. Multim., 2024

Dual Masked Modeling for Weakly-Supervised Temporal Boundary Discovery.
IEEE Trans. Multim., 2024

Attentive Snippet Prompting for Video Retrieval.
IEEE Trans. Multim., 2024

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection.
IEEE Trans. Image Process., 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning.
CoRR, 2024

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration.
CoRR, 2024

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration.
CoRR, 2024

EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation.
CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
CoRR, 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding.
CoRR, 2024

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition.
CoRR, 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities.
CoRR, 2024

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

InternVideo2: Scaling Foundation Models for Multimodal Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

VideoMamba: State Space Model for Efficient Video Understanding.
Proceedings of the Computer Vision - ECCV 2024, 2024

Vlogger: Make Your Dream A Vlog.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

M-BEV: Masked BEV Perception for Robust Autonomous Driving.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
UniFormer: Unifying Convolution and Self-Attention for Visual Recognition.
IEEE Trans. Pattern Anal. Mach. Intell., October, 2023

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.
IEEE Trans. Pattern Anal. Mach. Intell., August, 2023

Hybrid token transformer for deep face recognition.
Pattern Recognit., July, 2023

Towards robustness and generalization of point cloud representation: A geometry coding method and a large-scale object-level dataset.
Comput. Vis. Media, February, 2023

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding.
CoRR, 2023

MVBench: A Comprehensive Multi-modal Video Understanding Benchmark.
CoRR, 2023

Harvest Video Foundation Models via Efficient Post-Pretraining.
CoRR, 2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation.
CoRR, 2023

VideoLLM: Modeling Video Sequence with Large Language Models.
CoRR, 2023

VideoChat: Chat-Centric Video Understanding.
CoRR, 2023

InternGPT: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language.
CoRR, 2023

Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
CoRR, 2023

Learning Discriminative Feature Representation for Open Set Action Recognition.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

UniFormerV2: Unlocking the Potential of Image ViTs for Video Understanding.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Starting from Non-Parametric Networks for 3D Point Cloud Analysis.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Action Recognition With Motion Diversification and Dynamic Selection.
IEEE Trans. Image Process., 2022

FineAction: A Fine-Grained Video Dataset for Temporal Action Localization.
IEEE Trans. Image Process., 2022

InternVideo: General Video Foundation Models via Generative and Discriminative Learning.
CoRR, 2022

UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer.
CoRR, 2022

InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges.
CoRR, 2022

Low-Resolution Action Recognition for Tiny Actions Challenge.
CoRR, 2022

CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm.
CoRR, 2022

UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning.
CoRR, 2022

Visual Knowledge Graph for Human Action Reasoning in Videos.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

VideoPipe 2022 Challenge: Real-World Video Understanding for Urban Pipe Inspection.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-slimmed Vision Transformer.
Proceedings of the Computer Vision - ECCV 2022, 2022

MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Target-Relevant Knowledge Preservation for Multi-Source Domain Adaptive Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Wildfish++: A Comprehensive Fish Benchmark for Multimedia Research.
IEEE Trans. Multim., 2021

Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos.
IEEE Trans. Image Process., 2021

Multi-View Partial (MVP) Point Cloud Challenge 2021 on Completion and Registration: Methods and Results.
CoRR, 2021

MorphMLP: A Self-Attention Free, MLP-Like Backbone for Image and Video.
CoRR, 2021

FineAction: A Fined Video Dataset for Temporal Action Localization.
CoRR, 2021

CT-Net: Channel Tensorization Network for Video Classification.
Proceedings of the 9th International Conference on Learning Representations, 2021

Digging into Uncertainty in Self-supervised Multi-view Stereo.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

PC-HMR: Pose Calibration for 3D Human Mesh Recovery from 2D Images/Videos.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
Progressive Object Transfer Detection.
IEEE Trans. Image Process., 2020

DID: Disentangling-Imprinting-Distilling for Continuous Low-Shot Detection.
IEEE Trans. Image Process., 2020

Finding hard faces with better proposals and classifier.
Mach. Vis. Appl., 2020

Mining Inter-Video Proposal Relations for Video Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

SmallBigNet: Integrating Core and Contextual Views for Video Classification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Learning Attentive Pairwise Interaction for Fine-Grained Classification.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Context-Transformer: Tackling Object Confusion for Few-Shot Detection.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Dual-supervised attention network for deep cross-modal hashing.
Pattern Recognit. Lett., 2019

MetaCleaner: Learning to Hallucinate Clean Representations for Noisy-Labeled Visual Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

PA3D: Pose-Action 3D Machine for Video Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Adaptive Pyramid Context Network for Semantic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos.
IEEE Trans. Image Process., 2018

WildFish: A Large Benchmark for Fish Recognition in the Wild.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Temporal Hallucinating for Action Recognition With Few Still Images.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

LSTD: A Low-Shot Transfer Detector for Object Detection.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition.
IEEE Trans. Image Process., 2017

Bayesian inference for time-varying applications: Particle-based Gaussian process approaches.
Neurocomputing, 2017

An online Bayesian filtering framework for Gaussian process regression: Application to global surface temperature analysis.
Expert Syst. Appl., 2017

RPAN: An End-to-End Recurrent Pose-Attention Network for Action Recognition in Videos.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Sparse Deep Transfer Learning for Convolutional Neural Network.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
KNN-based Kalman filter: An efficient and non-stationary method for Gaussian process regression.
Knowl. Based Syst., 2016

Codebook enhancement of vlad representation for visual recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Human action recognition with DeepAction Kernel Gaussian Process.
Proceedings of the 2016 International Conference on Advanced Robotics and Mechatronics, 2016


  Loading...