Saurabh Gupta

Affiliations:
  • Intel Labs
  • Oak Ridge National Laboratory, USA


According to our database1, Saurabh Gupta authored at least 23 papers between 2013 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2021
Study of interconnect errors, network congestion, and applications characteristics for throttle prediction on a large scale HPC system.
J. Parallel Distributed Comput., 2021

2018
Exploring the Optimal Platform Configuration for Power-Constrained HPC Workflows.
Proceedings of the 27th International Conference on Computer Communication and Networks, 2018

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018

Understanding and Analyzing Interconnect Errors and Network Congestion on a Large Scale HPC System.
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2018

2017
Failures in large scale systems: long-term measurement, analysis, and implications.
Proceedings of the International Conference for High Performance Computing, 2017

Characterizing Temperature, Power, and Soft-Error Behaviors in Data Center Systems: Insights, Challenges, and Opportunities.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

Effective Running of End-to-End HPC Workflows on Emerging Heterogeneous Architectures.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
A multi-faceted approach to job placement for improved performance on extreme-scale systems.
Proceedings of the International Conference for High Performance Computing, 2016

Reducing Waste in Extreme Scale Systems through Introspective Analysis.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Adaptive Power Profiling for Many-Core HPC Architectures.
Proceedings of the 2016 IEEE International Conference on Autonomic Computing, 2016

A large-scale study of soft-errors on GPUs in the field.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Power-Capping Aware Checkpointing: On the Interplay Among Power-Capping, Temperature, Reliability, Performance, and Energy.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2016

A model-driven approach to warp/thread-block level GPU cache bypassing.
Proceedings of the 53rd Annual Design Automation Conference, 2016

2015
Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility.
Proceedings of the International Conference for High Performance Computing, 2015

Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Understanding and Exploiting Spatial Properties of System Failures on Extreme-Scale HPC Systems.
Proceedings of the 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015

2014
Best Practices and Lessons Learned from Deploying and Operating Large-Scale Data-Centric Parallel File Systems.
Proceedings of the International Conference for High Performance Computing, 2014

Improving large-scale storage system performance via topology-aware and balanced data placement.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Lazy Checkpointing: Exploiting Temporal Locality in Failures to Mitigate Checkpointing Overheads on Extreme-Scale Systems.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

2013
Locality principle revisited: A probability-based quantitative approach.
J. Parallel Distributed Comput., 2013

Analyzing locality of memory references in GPU architectures.
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, 2013

Adaptive Cache Bypassing for Inclusive Last Level Caches.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013


  Loading...