Jiabo Ye
Orcid: 0009-0009-5451-8984
According to our database1,
Jiabo Ye
authored at least 27 papers
between 2021 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
ACM Trans. Multim. Comput. Commun. Appl., August, 2024
mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding.
CoRR, 2024
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.
CoRR, 2024
CoRR, 2024
CoRR, 2024
Proceedings of the MultiMedia Modeling - 30th International Conference, 2024
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
A Sentimental Prompt Framework with Visual Text Encoder for Multimodal Sentiment Analysis.
Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024
VG-Annotator: Vision-Language Models as Query Annotators for Unsupervised Visual Grounding.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
MNER-MI: A Multi-image Dataset for Multimodal Named Entity Recognition in Social Media.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
CoRR, 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
CoRR, 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
CoRR, 2023
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
CoRR, 2023
CoRR, 2023
Proceedings of the International Conference on Machine Learning, 2023
Pseudo-Query Generation For Semi-Supervised Visual Grounding With Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
2022
Inferring substitutable and complementary products with Knowledge-Aware Path Reasoning based on dynamic policy network.
Knowl. Based Syst., 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
CoRR, 2022
CAT-MNER: Multimodal Named Entity Recognition with Knowledge-Refined Cross-Modal Attention.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
PromptMNER: Prompt-Based Entity-Related Visual Clue Extraction and Integration for Multimodal Named Entity Recognition.
Proceedings of the Database Systems for Advanced Applications, 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021