Saurabh Gupta
Affiliations:- Intel Labs
- Oak Ridge National Laboratory, USA
According to our database1,
Saurabh Gupta
authored at least 23 papers
between 2013 and 2021.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2021
Study of interconnect errors, network congestion, and applications characteristics for throttle prediction on a large scale HPC system.
J. Parallel Distributed Comput., 2021
2018
Proceedings of the 27th International Conference on Computer Communication and Networks, 2018
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018
Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018
2017
Proceedings of the International Conference for High Performance Computing, 2017
Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017
Effective Running of End-to-End HPC Workflows on Emerging Heterogeneous Architectures.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
2016
A multi-faceted approach to job placement for improved performance on extreme-scale systems.
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 2016 IEEE International Conference on Autonomic Computing, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2016
Proceedings of the 53rd Annual Design Automation Conference, 2016
2015
Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 44th International Conference on Parallel Processing, 2015
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015
2014
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems.
Proceedings of the International Conference for High Performance Computing, 2014
Improving large-scale storage system performance via topology-aware and balanced data placement.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014
2013
J. Parallel Distributed Comput., 2013
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013