2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.
CoRR, 2024

RedPajama: an Open Dataset for Training Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
DeltaZip: Multi-Tenant Language Model Serving via Delta Compression.
CoRR, 2023

DMLR: Data-centric Machine Learning Research - Past, Present and Future.
CoRR, 2023

HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.
CoRR, 2023

DataPerf: Benchmarks for Data-Centric AI Development.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
SHiFT: An Efficient, Flexible Search Engine for Transfer Learning.
Proc. VLDB Endow., 2022

DataPerf: Benchmarks for Data-Centric AI Development.
CoRR, 2022

2018
CVTron Web: A Versatile Framework for Online Computer Vision Services.
Proceedings of the Services - SERVICES 2018, 2018

2017
Face Based Advertisement Recommendation with Deep Learning: A Case Study.
Proceedings of the Smart Computing and Communication, 2017