Zijia Zhao

Orcid: 0009-0000-7781-932X

According to our database1, Zijia Zhao authored at least 20 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2021
2022
2023
2024
0
5
10
8
2
1
1
4
3
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval.
CoRR, 2024

Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining.
CoRR, 2024

Exploring the Design Space of Visual Context Representation in Video MLLMs.
CoRR, 2024

OneDiff: A Generalist Model for Image Difference Captioning.
CoRR, 2024

Towards Event-oriented Long Video Understanding.
CoRR, 2024

Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs.
CoRR, 2024

VL-Mamba: Exploring State Space Models for Multimodal Learning.
CoRR, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
CoRR, 2024

Collaborative Training of Tiny-Large Vision Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

SC- Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

OneDiff: A Generalist Model for Image Difference Captioning.
Proceedings of the Computer Vision - ACCV 2024, 2024

2023
A Digital Twin-Assisted Intelligent Partial Offloading Approach for Vehicular Edge Computing.
IEEE J. Sel. Areas Commun., November, 2023

ChatBridge: Bridging Modalities with Large Language Model as a Language Catalyst.
CoRR, 2023

MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.
Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

Snake-inspired Swarm Robot Design for Distributed Underwater Search and Rescue.
Proceedings of the IEEE International Conference on Robotics and Biomimetics, 2023

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.
CoRR, 2022

2021
OPT: Omni-Perception Pre-Trainer for Cross-Modal Understanding and Generation.
CoRR, 2021

MM21 Pre-training for Video Understanding Challenge: Video Captioning with Pretraining Techniques.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021


  Loading...