2023
Prolego: Time-Series Analysis for Predicting Failures in Complex Systems.
Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems, 2023

2022
Performance Variability and Causality in Complex Systems.
Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion, 2022

2021
Systemic Assessment of Node Failures in HPC Production Platforms.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
Aarohi: Making Real-Time Node Failure Prediction Feasible.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2018
KeyValueServe†: Design and performance analysis of a multi-tenant data grid as a cloud service.
Concurr. Comput. Pract. Exp., 2018

Doomsday: predicting which node will fail when on supercomputers.
Proceedings of the International Conference for High Performance Computing, 2018

Desh: deep learning for system health prediction of lead times to failure in HPC.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

2016
Performance Analysis of a Multi-tenant In-Memory Data Grid.
Proceedings of the 9th IEEE International Conference on Cloud Computing, 2016

2012
Dynamic resource management using virtual machine migrations.
IEEE Commun. Mag., 2012