2025

olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models.

[DOI]

CoRR, February, 2025

2024

The Semantic Reader Project.

[DOI]

Commun. ACM, October, 2024

2023

The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces.

[DOI]

CoRR, 2023

The Semantic Scholar Open Data Platform.

[DOI]

CoRR, 2023

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents.

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023