Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database1, Zijia Zhao authored at least 16 papers between 2021 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
OneDiff: A Generalist Model for Image Difference Captioning.
CoRR, 2024

Towards Event-oriented Long Video Understanding.
CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.
CoRR, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.
CoRR, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.
IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.
CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021


  Loading...