ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding.
CoRR, April, 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos.
CoRR, March, 2025
A Hierarchical Semantic Distillation Framework for Open-Vocabulary Object Detection.
CoRR, March, 2025
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning.
CoRR, March, 2025
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models.
CoRR, January, 2025
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding.
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
Omni-Emotion: Extending Video MLLM with Detailed Face and Audio Modeling for Multimodal Emotion Analysis.
CoRR, January, 2025
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual Awareness.
CoRR, January, 2025
LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding.
CoRR, January, 2025
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
DreamView: Injecting View-Specific Text Guidance Into Text-to-3D Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition.
Proceedings of the Computer Vision - ECCV 2022, 2022
SP-ViT: Learning 2D Spatial Priors for Vision Transformers.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022
Interactive Self-Training With Mean Teachers for Semi-Supervised Object Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Continual Local Replacement for Few-shot Image Recognition.
CoRR, 2020
Learning Continually from Low-shot Data Stream.
CoRR, 2019