Learning Spatial Similarity Distribution for Few-shot Object Counting.

[DOI]

Yuanwu Xu

,

Feifan Song

,

Haofeng Zhang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Memory-Augmented Transformer for Efficient End-to-End Video Grounding.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

CAMG: Context-Aware Moment Graph Network for Multimodal Temporal Activity Localization via Language.

[DOI]

,

,

,

,

,

,

Proceedings of the Natural Language Processing and Chinese Computing, 2023

SPTNET: Span-based Prompt Tuning for Video Grounding.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

Conditional Video-Text Reconstruction Network with Cauchy Mask for Weakly Supervised Temporal Sentence Grounding.

[DOI]

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2023

STDNet: Spatio-Temporal Decomposed Network for Video Grounding.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Multimedia and Expo, 2022