Scott Levy

Orcid: 0000-0002-2232-3201

According to our database1, Scott Levy authored at least 55 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Special Issue on Hot Interconnects 30.
IEEE Micro, 2024

2023
Special Issue on Hot Interconnects 29.
IEEE Micro, 2023

Using Benford's Law to Identify Unusual Failure Regions.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Evaluating the Viability of LogGP for Modeling MPI Performance with Non-contiguous Datatypes on Modern Architectures.
Proceedings of the 30th European MPI Users' Group Meeting, 2023

Measuring Thread Timing to Assess the Feasibility of Early-bird Message Delivery.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained Communication.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand Verbs.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
"Smarter" NICs for faster molecular dynamics: a case study.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Understanding Memory Failures on a Petascale Arm System.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

2021
Evaluating MPI resource usage summary statistics.
Parallel Comput., 2021

Characterizing Memory Failures Using Benford's Law.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

MiniMod: A Modular Miniapplication Benchmarking Framework for HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

pMEMCPY: a simple, lightweight, and portable I/O library for storing data in persistent memory.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Understanding the Effects of DRAM Correctable Error Logging at Scale.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
The Program with a Personality: Analysis of Elk Cloner, the First Personal Computer Virus.
CoRR, 2020

The unexpected virtue of almost: Exploiting MPI collective operations to approximately coordinate checkpoints.
Concurr. Comput. Pract. Exp., 2020

Hardware MPI message matching: Insights into MPI matching behavior to inform design.
Concurr. Comput. Pract. Exp., 2020

ALAMO: Autonomous Lightweight Allocation, Management, and Optimization.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Message from the Workshop Chair.
Proceedings of the 10th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2020

RaDD Runtimes: Radical and Different Distributed Runtimes with SmartNICs.
Proceedings of the Fourth IEEE/ACM Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2020

Evaluating MPI Message Size Summary Statistics.
Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

The Case for Explicit Reuse Semantics for RDMA Communication.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Low-cost MPI Multithreaded Message Matching Benchmarking.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

2019
Using simulation to examine the effect of MPI message matching costs on application performance.
Parallel Comput., 2019

Mediating Data Center Storage Diversity in HPC Applications with FAODEL.
Proceedings of the High Performance Computing, 2019

Evaluating tradeoffs between MPI message matching offload hardware capacity and performance.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Space-Efficient Reed-Solomon Encoding to Detect and Correct Pointer Corruption.
Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

2018
Characterizing MPI matching via trace-based simulation.
Parallel Comput., 2018

Lessons learned from memory errors observed over the lifetime of Cielo.
Proceedings of the International Conference for High Performance Computing, 2018

Using Simulation to Examine the Effect of MPI Message Matching Costs on Application Performance.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Open Science on Trinity's Knights Landing Partition: An Analysis of User Job Data.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Faodel: Data Management for Next-Generation Application Workflows.
Proceedings of the 9th Workshop on Scientific Cloud Computing, 2018

2017
Empress: extensible metadata provider for extreme-scale scientific simulations.
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2017

It's Not the Heat, It's the Humidity: Scheduling Resilience Activity at Scale.
Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

Lifetime memory reliability data from the field.
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2017

Evaluating the Viability of Using Compression to Mitigate Silent Corruption of Read-Mostly Application Data.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
On noise and the performance benefit of nonblocking collectives.
Int. J. High Perform. Comput. Appl., 2016

Understanding performance interference in next-generation HPC systems.
Proceedings of the International Conference for High Performance Computing, 2016

Improving application resilience to memory errors with lightweight compression.
Proceedings of the International Conference for High Performance Computing, 2016

How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

An Examination of the Impact of Failure Distribution on Coordinated Checkpoint/Restart.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2016

Horseshoes and Hand Grenades: The Case for Approximate Coordination in Local Checkpointing Protocols.
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Improving DRAM Fault Characterization through Machine Learning.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2016

Scheduling In-Situ Analytics in Next-Generation Applications.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015
A study of the viability of exploiting memory content similarity to improve resilience to memory errors.
Int. J. High Perform. Comput. Appl., 2015

Canaries in a Coal Mine: Using Application-Level Checkpoints to Detect Memory Failures.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014
Understanding the Effects of Communication and Coordination on Checkpointing at Scale.
Proceedings of the International Conference for High Performance Computing, 2014

Exploring the effect of noise on the performance benefit of nonblocking allreduce.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Characterizing the Impact of Rollback Avoidance at Extreme-Scale: A Modeling Approach.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

2013
Using Simulation to Evaluate the Performance of Resilience Strategies at Scale.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Exploiting Content Similarity to Improve Memory Performance in Large-Scale High-Performance Computing Systems.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Evaluating the feasibility of using memory content similarity to improve system resilience.
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, 2013

Using unreliable virtual hardware to inject errors in extreme-scale systems.
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, 2013

Asking the Right Questions: Benchmarking Fault-Tolerant Extreme-Scale Systems.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

2011
Exploiting MISD Performance Opportunities in Multi-core Systems.
Proceedings of the 13th Workshop on Hot Topics in Operating Systems, 2011


  Loading...