Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database¹, Zijia Zhao authored at least 20 papers between 2021 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Exploring the Design Space of Visual Context Representation in Video MLLMs.

[BibT_eX]

[DOI]

CoRR, 2024

OneDiff: A Generalist Model for Image Difference Captioning.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Event-oriented Long Video Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.

[BibT_eX]

[DOI]

CoRR, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.

[BibT_eX]

[DOI]

CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

OneDiff: A Generalist Model for Image Difference Captioning.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ACCV 2024, 2024

2023

A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.

[BibT_eX]

[DOI]

Ahmed Yassin Al-Dubai

Zhiyuan Tan

Amir Hussain

IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.

[BibT_eX]

[DOI]

CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.

[BibT_eX]

[DOI]

Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.

[BibT_eX]

[DOI]

CoRR, 2022

2021

OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.

[BibT_eX]

[DOI]

CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.

[BibT_eX]

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Zijia Zhao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...