RTime-QA: A Benchmark for Atomic Temporal Event Understanding in Large Multi-modal Models.

[DOI]

Yuqi Liu

,

Qin Jin

,

,

,

,

,

CoRR, May, 2025

VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning.

[DOI]

,

,

,

,

,

,

CoRR, May, 2025

Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?

[DOI]

,

,

,

,

,

CoRR, March, 2025

FILP-3D: Enhancing 3D few-shot class-incremental learning with pre-trained vision-language models.

[DOI]

,

,

,

,

,

Pattern Recognit., 2025

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

CoRR, 2024

An Improved Baseline for Reasoning Segmentation with Large Language Model.

[DOI]

,

,

,

,

,

,

CoRR, 2023