2025

ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding.

[DOI]

Yi-Xing Peng

Qize Yang

CoRR, April, 2025

ViSpeak: Visual Instruction Feedback in Streaming Videos.

[DOI]

CoRR, March, 2025

A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection.

[DOI]

CoRR, March, 2025

R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning.

[DOI]

Jiaxing Zhao

Xihan Wei

Liefeng Bo

CoRR, March, 2025

LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models.

[DOI]

CoRR, January, 2025

HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding.

[DOI]

CoRR, January, 2025

Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis.

[DOI]

CoRR, January, 2025

Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness.

[DOI]

CoRR, January, 2025

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding.

[DOI]

CoRR, January, 2025

2024

Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

DreamView: Injecting View-Specific Text Guidance Into Text-to-3D Generation.

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

2022

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition.

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

SP-ViT: Learning 2D Spatial Priors for Vision Transformers.

[DOI]

Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021

Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection.

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Continual Local Replacement for Few-shot Image Recognition.

[DOI]

CoRR, 2020

2019

Learning Continually from Low-shot Data Stream.

[DOI]

CoRR, 2019