Adaptive Keyframe Sampling for Long Video Understanding.
CoRR, February, 2025
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network With Token Migration.
IEEE Trans. Pattern Anal. Mach. Intell., December, 2024
ChatterBox: Multi-round Multimodal Referring and Grounding.
CoRR, 2024
Artemis: Towards Referential Understanding in Complex Videos.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024