TAP: Text-Aware Pre-Training for Text-VQA and Text-Caption.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
VIVO: Surpassing Human Performance in Novel Object Captioning with Visual Vocabulary Pre-Training.
CoRR, 2020
Hashing-based Non-Maximum Suppression for Crowded Object Detection.
CoRR, 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Computer Vision - ECCV 2020, 2020