2024
Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models.
CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
CoRR, 2024

Nemotron-4 340B Technical Report.
CoRR, 2024

Nemotron-4 15B Technical Report.
CoRR, 2024

Data, Data Everywhere: A Guide for Pretraining Dataset Construction.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2022
The Importance of Background Information for Out of Distribution Generalization.
CoRR, 2022

2021
Observational Supervision for Medical Image Classification Using Gaze Data.
Proceedings of the Medical Image Computing and Computer Assisted Intervention - MICCAI 2021 - 24th International Conference, Strasbourg, France, September 27, 2021

2020
Biomedical Information Extraction for Disease Gene Prioritization.
CoRR, 2020