2025
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models.
CoRR, April, 2025

2023
Tectonic-Shift: A Composite Storage Fabric for Large-Scale ML Training.
Proceedings of the 2023 USENIX Annual Technical Conference, 2023

RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

2022
Understanding data storage and ingestion for large-scale deep recommendation model training: industrial product.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Software-hardware co-design for fast and scalable training of deep learning recommendation models.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2021
Understanding and Co-designing the Data Ingestion Pipeline for Industry-Scale RecSys Training.
CoRR, 2021

High-performance, Distributed Training of Large-scale Deep Learning Recommendation Models.
CoRR, 2021