Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models.
CoRR, January, 2025
Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models.
CoRR, January, 2025
Object-Centric Cross-Modal Knowledge Reasoning for Future Event Prediction in Videos.
IEEE Trans. Circuits Syst. Video Technol., December, 2024
Leveraging Smooth Deformation Augmentation for LiDAR Point Cloud Semantic Segmentation.
IEEE Trans. Intell. Veh., February, 2024
DADL: Double Asymmetric Distribution Learning for head pose estimation in wisdom museum.
J. King Saud Univ. Comput. Inf. Sci., January, 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
GCANet: Geometry cues-aware facial expression recognition based on graph convolutional networks.
J. King Saud Univ. Comput. Inf. Sci., July, 2023
Weakly Supervised Learning of Semantic Correspondence through Cascaded Online Correspondence Refinement.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Infrared facial expression recognition via Gaussian-based label distribution learning in the dark illumination environment for human emotion detection.
Neurocomputing, 2020