Ignacio Laguna

Orcid: 0000-0002-9374-4433

Affiliations:
  • Lawrence Livermore National Laboratory, CA, USA


According to our database1, Ignacio Laguna authored at least 93 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
An automated OpenMP mutation testing framework for performance optimization.
Parallel Comput., 2024

Testing the Unknown: A Framework for OpenMP Testing via Random Program Generation.
CoRR, 2024

Testing GPU Numerics: Finding Numerical Differences Between NVIDIA and AMD GPUs.
CoRR, 2024

MUPPET: Optimizing Performance in OpenMP via Mutation Testing.
Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores, 2024

Input Range Generation for Compiler-Induced Numerical Inconsistencies.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

FPBOXer: Efficient Input-Generation for Targeting Floating-Point Exceptions in GPU Programs.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi - Threaded Programs.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Understanding Mixed Precision GEMM with MPGemmFI: Insights into Fault Resilience.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Enhancing Performance Through Control-Flow Unmerging and Loop Unrolling on GPUs.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

Discovery of Floating-Point Differences Between NVIDIA and AMD GPUs.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

FTTN: Feature-Targeted Testing for Numerical Properties of NVIDIA & AMD Matrix Accelerators.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023
Finding inputs that trigger floating-point exceptions in heterogeneous computing via Bayesian optimization.
Parallel Comput., September, 2023

Approximate High-Performance Computing: A Fast and Energy-Efficient Computing Paradigm in the Post-Moore Era.
IT Prof., 2023

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications.
CoRR, 2023

Expression Isolation of Compiler-Induced Numerical Inconsistencies in Heterogeneous Code.
Proceedings of the High Performance Computing - 38th International Conference, 2023

Understanding System Resilience for Converged Computing of Cloud, Edge, and HPC.
Proceedings of the High Performance Computing, 2023

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay.
Proceedings of the International Conference for High Performance Computing, 2023

Design and Evaluation of GPU-FPX: A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

2022
Giving Research Software Engineers a Larger Stage Through the Better Scientific Software Fellowship.
Comput. Sci. Eng., 2022

Giving RSEs a Larger Stage through the Better Scientific Software Fellowship.
CoRR, 2022

Toward Increasing Trust in Exascale Simulations.
Proceedings of the 4th Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing, 2022

Approximate Computing Through the Lens of Uncertainty Quantification.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Finding Inputs that Trigger Floating-Point Exceptions in GPUs via Bayesian Optimization.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

BinFPE: accurate floating-point exception detection for GPU applications.
Proceedings of the SOAP '22: 11th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis, 2022

Piper: Pipelining OpenMP Offloading Execution Through Compiler Optimization For Performance.
Proceedings of the IEEE/ACM International Workshop on Performance, 2022

FPChecker: Floating-Point Exception Detection Tool and Benchmark for Parallel and Distributed HPC.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Towards Precision-Aware Fault Tolerance Approaches for Mixed-Precision Applications.
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

2021
PredCom: A Predictive Approach to Collecting Approximated Communication Traces.
IEEE Trans. Parallel Distributed Syst., 2021

PARIS: Predicting application resilience using machine learning.
J. Parallel Distributed Comput., 2021

Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance.
CoRR, 2021

Report of the Workshop on Program Synthesis for Scientific Computing.
CoRR, 2021

Understanding the use of message passing interface in exascale proxy applications.
Concurr. Comput. Pract. Exp., 2021

Keeping science on keel when software moves.
Commun. ACM, 2021

HPAC: evaluating approximate computing techniques on HPC OpenMP applications.
Proceedings of the International Conference for High Performance Computing, 2021

Examining Failures and Repairs on Supercomputers with Multi-GPU Compute Nodes.
Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

Guarding Numerics Amidst Rising Heterogeneity.
Proceedings of the 5th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2021

Co-Designing Multi-Level Checkpoint Restart for MPI Applications.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Concurr. Comput. Pract. Exp., 2020

Reinit<sup>++</sup>: Evaluating the Performance of Global-Restart Recovery Methods for MPI Fault Tolerance.
Proceedings of the High Performance Computing - 35th International Conference, 2020

OMPRacer: a scalable and precise static race detector for OpenMP programs.
Proceedings of the International Conference for High Performance Computing, 2020

Extending the MPI Stages Model of Fault Tolerance.
Proceedings of the Workshop on Exascale MPI, 2020

pLiner: isolating lines of floating-point code for compiler-induced variability.
Proceedings of the International Conference for High Performance Computing, 2020

ArcherGear: data race equivalencing for expeditious HPC debugging.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Detecting and reproducing error-code propagation bugs in MPI implementations.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

FAROS: A Framework to Analyze OpenMP Compilation Through Benchmarking and Compiler Optimization Analysis.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Varity: Quantifying Floating-Point Variations in HPC Systems Through Randomized Testing.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

HPC-MixPBench: An HPC Benchmark Suite for Mixed-Precision Analysis.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

MATCH: An MPI Fault Tolerance Benchmark Suite.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

2019
Failure recovery for bulk synchronous applications with MPI stages.
Parallel Comput., 2019

Pruners.
Int. J. High Perform. Comput. Appl., 2019

GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications.
Proceedings of the High Performance Computing - 34th International Conference, 2019

A large-scale study of MPI usage in open-source HPC applications.
Proceedings of the International Conference for High Performance Computing, 2019

FPChecker: Detecting Floating-Point Exceptions in GPU Applications.
Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, 2019

SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

AMPT-GA: automatic mixed precision floating point tuning for GPU applications.
Proceedings of the ACM International Conference on Supercomputing, 2019

Multi-Level Analysis of Compiler-Induced Variability and Performance Tradeoffs.
Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

ExaMPI: A Modern Design and Implementation to Accelerate Message Passing Interface Innovation.
Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

2018
FlipTracker: understanding natural error resilience in HPC applications.
Proceedings of the International Conference for High Performance Computing, 2018

MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2017
Exploring versioned distributed arrays for resilience in scientific applications.
Int. J. High Perform. Comput. Appl., 2017

Report of the HPC Correctness Summit, Jan 25-26, 2017, Washington, DC.
CoRR, 2017

Snowpack: efficient parameter choice for GPU kernels via static analysis and statistical prediction.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

REFINE: realistic fault injection via compiler-based instrumentation for accuracy, portability and speed.
Proceedings of the International Conference for High Performance Computing, 2017

Noise Injection Techniques to Expose Subtle and Unintended Message Races.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2017

2016
Evaluating and extending user-level fault tolerance in MPI applications.
Int. J. High Perform. Comput. Appl., 2016

Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications.
Proceedings of the International Conference for High Performance Computing, 2016

Testing Infrastructure for OpenMP Debugging Interface Implementations.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

IPAS: intelligent protection against silent output corruption in scientific applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

2015
Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference.
IEEE Trans. Parallel Distributed Syst., 2015

Debugging high-performance computing applications at massive scales.
Commun. ACM, 2015

Clock delta compression for scalable order-replay of non-deterministic parallel applications.
Proceedings of the International Conference for High Performance Computing, 2015

Lessons Learned from Implementing OMPD: A Debugging Interface for OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience.
Proceedings of the International Conference on Computational Science, 2015

2014
Towards providing low-overhead data race detection for large OpenMP applications.
Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, 2014

Evaluating User-Level Fault Tolerance for MPI Applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Accurate application progress analysis for large-scale parallel debugging.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

2013
Automatic Problem Localization via Multi-dimensional Metric Profiling.
Proceedings of the IEEE 32nd Symposium on Reliable Distributed Systems, 2013

A study of application-level recovery methods for transient network faults.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset.
Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, 2013

Performance Analysis Techniques for the Exascale Co-Design Process.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

2012
Automatic fault characterization via abnormality-enhanced classification.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Probabilistic diagnosis of performance faults in large-scale parallel applications.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Large scale debugging of parallel tasks with AutomaDeD.
Proceedings of the Conference on High Performance Computing Networking, 2011

2010
AutomaDeD: Automata-based debugging for dissimilar parallel tasks.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

2009
Scalable temporal order analysis for large scale debugging.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

How to Keep Your Head above Water While Detecting Errors.
Proceedings of the Middleware 2009, ACM/IFIP/USENIX, 10th International Middleware Conference, Urbana, IL, USA, November 30, 2009

Stateful error detection in high throughput applications.
Proceedings of the Middleware 2008, 2009

2007
Stateful Detection in High Throughput Distributed Systems.
Proceedings of the 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), 2007

Distributed Diagnosis of Failures in a Three Tier E-Commerce System.
Proceedings of the 26th IEEE Symposium on Reliable Distributed Systems (SRDS 2007), 2007


  Loading...