Kathryn M. Mohror

Orcid: 0000-0002-1366-1655

Affiliations:
  • Lawrence Livermore National Laboratory, Livermore, CA, USA


According to our database1, Kathryn M. Mohror authored at least 89 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Visual Comparison of Silent Error Propagation.
IEEE Trans. Vis. Comput. Graph., July, 2024

Formal Definitions and Performance Comparison of Consistency Models for Parallel File Systems.
IEEE Trans. Parallel Distributed Syst., June, 2024

DFTracer: An Analysis-Friendly Data Flow Tracer for AI-Driven Workflows.
Proceedings of the International Conference for High Performance Computing, 2024

The Impact of Asynchronous I/O in Checkpoint-Restart Workloads.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Understanding Highly Configurable Storage for Diverse Workloads.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems.
CoRR, 2023

IOMax: Maximizing Out-of-Core I/O Analysis Performance on HPC Systems.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Mimir: Extending I/O Interfaces to Express User Intent for Complex Workloads in HPC.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

UnifyFS: A User-level Shared File System for Unified Access to Distributed Local Storage.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

I/O characterization and performance evaluation of large-scale storage architectures for heterogeneous workloads.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
The COVID-19 High-Performance Computing Consortium.
Comput. Sci. Eng., 2022

DFMan: A Graph-based Optimization of Dataflow Scheduling on High-Performance Computing Systems.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Extracting and characterizing I/O behavior of HPC workloads.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
SpotSDC: Revealing the Silent Data Corruption Propagation in High-Performance Computing Systems.
IEEE Trans. Vis. Comput. Graph., 2021

Mitigating Inter-Job Interference via Process-Level Quality-of-Service.
ACM Trans. Parallel Comput., 2021

Understanding I/O Behavior in Scientific and Data-Intensive Computing (Dagstuhl Seminar 21332).
Dagstuhl Reports, 2021

Large-Scale Scientific Computing in the Fight Against COVID-19.
Comput. Sci. Eng., 2021

Interactive Supercomputing With Jupyter.
Comput. Sci. Eng., 2021

It's Time to Talk About HPC Storage: Perspectives on the Past and Future.
Comput. Sci. Eng., 2021

VELOC: VEry Low Overhead Checkpointing in the Age of Exascale.
CoRR, 2021

Understanding the use of message passing interface in exascale proxy applications.
Concurr. Comput. Pract. Exp., 2021

Understanding a program's resiliency through error propagation.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

File System Semantics Requirements of HPC Applications.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
QMPI: A next generation MPI profiling interface for modern HPC platforms.
Parallel Comput., 2020

Ad Hoc File Systems for High-Performance Computing.
J. Comput. Sci. Technol., 2020

EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Concurr. Comput. Pract. Exp., 2020

Extending the MPI Stages Model of Fault Tolerance.
Proceedings of the Workshop on Exascale MPI, 2020

Emulating I/O Behavior in Scientific Workflows on High Performance Computing Systems.
Proceedings of the Fifth IEEE/ACM International Parallel Data Systems Workshop, 2020

Recorder 2.0: Efficient Parallel I/O Tracing and Analysis.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

First IEEE International Workshop on High-Performance Storage (HPS).
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Understanding HPC Application I/O Behavior Using System Level Statistics.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019
Failure recovery for bulk synchronous applications with MPI stages.
Parallel Comput., 2019

The MPI_T events interface: An early evaluation and overview of the interface.
Parallel Comput., 2019

A large-scale study of MPI usage in open-source HPC applications.
Proceedings of the International Conference for High Performance Computing, 2019

VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

I/O Characterization and Performance Evaluation of BeeGFS for Deep Learning.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Efficient User-Level Storage Disaggregation for Deep Learning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

ExaMPI: A Modern Design and Implementation to Accelerate Message Passing Interface Innovation.
Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

2018
ADAPT: algorithmic differentiation applied to floating-point precision tuning.
Proceedings of the International Conference for High Performance Computing, 2018

MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Enabling callback-driven runtime introspection via MPI_T.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

DisCVar: discovering critical variables using algorithmic differentiation for transient faults.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Entropy-Aware I/O Pipelining for Large-Scale Deep Learning on HPC Systems.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

A Study of Network Quality of Service in Many-Core MPI Applications.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support.
Proceedings of the 8th International Workshop on Runtime and Operating Systems for Supercomputers, 2018

2017
Challenges and Opportunities of User-Level File Systems for HPC (Dagstuhl Seminar 17202).
Dagstuhl Reports, 2017

MetaKV: A Key-Value Store for Metadata Management of Distributed Burst Buffers.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

The Popper Convention: Making Reproducible Systems Evaluation Practical.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Demo abstract: PopperCI: Automated reproducibility validation.
Proceedings of the 2017 IEEE Conference on Computer Communications Workshops, 2017

PopperCI: Automated reproducibility validation.
Proceedings of the 2017 IEEE Conference on Computer Communications Workshops, 2017

Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017

2016
Standing on the Shoulders of Giants by Managing Scientific Experiments Like Software.
login Usenix Mag., 2016

I Aver: Providing Declarative Experiment Specifications Facilitates the Evaluation of Computer Systems Research.
Adv. Math. Commun., 2016

Evaluating and extending user-level fault tolerance in MPI applications.
Int. J. High Perform. Comput. Appl., 2016

Exploring the MPI tool information interface: features and capabilities.
Int. J. High Perform. Comput. Appl., 2016

An ephemeral burst-buffer file system for scientific applications.
Proceedings of the International Conference for High Performance Computing, 2016

Allowing MPI tools builders to forget about Fortran.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Structural Clustering: A New Approach to Support Performance Analysis at Scale.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

I/O Aware Power Shifting.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Characterizing and Reducing Cross-Platform Performance Variability Using OS-Level Virtualization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Managing I/O Interference in a Shared Burst Buffer System.
Proceedings of the 45th International Conference on Parallel Processing, 2016

2015
Tackling the reproducibility problem in storage systems research with declarative experiment specifications.
Proceedings of the 10th Parallel Data Storage Workshop, 2015

The Role of Container Technology in Reproducible Computer Systems Research.
Proceedings of the 2015 IEEE International Conference on Cloud Engineering, 2015

2014
Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System.
IEEE Trans. Parallel Distributed Syst., 2014

Exploring the Capabilities of the New MPI_T Interface.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

IO-Cop: Managing Concurrent Accesses to Shared Parallel File System.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013
McrEngine: A scalable checkpointing system using data-aware aggregation and compression.
Sci. Program., 2013

There goes the neighborhood: performance degradation due to nearby jobs.
Proceedings of the International Conference for High Performance Computing, 2013

HIPS Introduction.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A 1 PB/s file system to checkpoint three million MPI tasks.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Alignment-Based Metrics for Trace Comparison.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
Trace profiling: Scalable event tracing on high-end parallel systems.
Parallel Comput., 2012

Design and modeling of a non-blocking checkpointing system.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Integrated in-system storage architecture for high performance computing.
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, 2012

Asynchronous checkpoint migration with MRNet in the Scalable Checkpoint / Restart Library.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

2010
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System.
Proceedings of the Conference on High Performance Computing Networking, 2010

2009
Evaluating similarity-based trace reduction techniques for scalable performance analysis.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Scalable Event Trace Visualization.
Proceedings of the Euro-Par 2009, 2009

2007
Scalable event-based performance measurement in high-end environments.
SIGMETRICS Perform. Evaluation Rev., 2007

A study of tracing overhead on a high-performance linux cluster.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Towards Scalable Event Tracing for High End Systems.
Proceedings of the High Performance Computing and Communications, 2007

2005
Integrating Database Technology with Comparison-based Parallel Performance Diagnosis: The PerfTrack Performance Experiment Management Tool.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

PPerfGrid: A Grid Services-based Tool for the Exchange of Heterogeneous Parallel Performance Data.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
Performance Tool Support for MPI-2 on Linux.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004


  Loading...