2025
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models.
CoRR, February, 2025

2024
The Semantic Reader Project.
Commun. ACM, October, 2024

2023
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces.
CoRR, 2023

The Semantic Scholar Open Data Platform.
CoRR, 2023

PaperMage: A Unified Toolkit for Processing, Representing, and Manipulating Visually-Rich Scientific Documents.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023