Haoyu Cao

Orcid: 0000-0002-3789-9705

According to our database1, Haoyu Cao authored at least 14 papers between 2022 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Turning a CLIP Model Into a Scene Text Spotter.
IEEE Trans. Pattern Anal. Mach. Intell., September, 2024

Communication-efficient clustered federated learning via model distance.
Mach. Learn., June, 2024

HRVDA: High-Resolution Visual Document Assistant.
CoRR, 2024

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

HRVDA: High-Resolution Visual Document Assistant.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Few-shot Temporal Pruning Accelerates Diffusion Models for Text Generation.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
CoRR, 2023

ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2022
GMN: Generative Multi-modal Network for Practical Document Information Extraction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Relational Representation Learning in Visually-Rich Documents.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Query-driven Generative Network for Document Information Extraction in the Wild.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022


  Loading...