Martin Schulz

Orcid: 0000-0001-9013-435X

Affiliations:
  • Technical University Munich, Germany
  • Lawrence Livermore National Laboratory, Computer Science Group (former)


According to our database1, Martin Schulz authored at least 315 papers between 1997 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities.
IEEE Trans. Parallel Distributed Syst., September, 2024

Malleability techniques applications in high-performance computing.
Int. J. High Perform. Comput. Appl., 2024

Comparison of Atom Detection Algorithms for Neutral Atom Quantum Computing.
CoRR, 2024

Integration of Quantum Accelerators into HPC: Toward a Unified Quantum Platform.
CoRR, 2024

Design Principles of Dynamic Resource Management for High-Performance Parallel Programming Models.
CoRR, 2024

Every Mapping Counts in Large Amounts: Folio Accounting.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

Calibration and Performance Evaluation of a Superconducting Quantum Processor in an HPC Center.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Dynamic Resource Management for In-Situ Techniques Using MPI-Sessions.
Proceedings of the Recent Advances in the Message Passing Interface, 2024

Adopting User-Space Networking for DDS Message-Oriented Middleware.
Proceedings of the IEEE International Conference on Pervasive Computing and Communications, 2024

sys-sage: A Unified Representation of Dynamic Topologies & Attributes on HPC Systems.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

A Mechanism to Generate Interception Based Tools for HPC Libraries.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

Dataset Distillation by Automatic Training Trajectories.
Proceedings of the Computer Vision - ECCV 2024, 2024

Distributed Order Recording Techniques for Efficient Record-and-Replay of Multi - Threaded Programs.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

A Portable Tool to Compare Performance Profiles from GPU Offloading Programming Models.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

The European Chips Act and its Impact on Teaching.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

From the Physics Lab to the Computer Lab: Towards Flexible and Comprehensive DevOps for Quantum Computing.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

Exploring the ARM Coherent Mesh Network Topology.
Proceedings of the Architecture of Computing Systems - 37th International Conference, 2024

2023
Quantum Task Offloading with the OpenMP API.
CoRR, 2023

Integration of Quantum Accelerators with High Performance Computing - A Review of Quantum Programming Tools.
CoRR, 2023

GreenCourier: Carbon-Aware Scheduling for Serverless Functions.
Proceedings of the 9th International Workshop on Serverless Computing, 2023

Overcoming Weak Scaling Challenges in Tree-Based Nearest Neighbor Time Series Mining.
Proceedings of the High Performance Computing - 38th International Conference, 2023

A Case Study on PMIx-Usage for Dynamic Resource Management.
Proceedings of the High Performance Computing, 2023

Probabilistic Job History Conversion and Performance Model Generation for Malleable Scheduling Simulations.
Proceedings of the High Performance Computing, 2023

GPUscout: Locating Data Movement-related Bottlenecks on GPUs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Sustainability in HPC: Vision and Opportunities.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

DDS Implementations as Real-Time Middleware - A Systematic Evaluation.
Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

Realistic Neutral Atom Image Simulation.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Toward a Unified Hybrid HPCQC Toolchain.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Towards the Munich Quantum Software Stack: Enabling Efficient Access and Tool Support for Quantum Computers.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023


Quantum Computer Metrics and HPC Center Environmental Sensor Data Analysis Towards Fidelity Prediction.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Systematic Analysis of DDS Implementations.
Proceedings of the 24th International Middleware Conference, 2023

Federated Learning via Decentralized Dataset Distillation in Resource-Constrained Edge Environments.
Proceedings of the International Joint Conference on Neural Networks, 2023

Real-Time Capability of Dlr's Beamforming Synthetic Aperture Radar Processing Architecture.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023

HiSEP-Q: A Highly Scalable and Efficient Quantum Control Processor for Superconducting Qubits.
Proceedings of the 41st IEEE International Conference on Computer Design, 2023

A Scalable and Cross-Technology Quantum Control Processor.
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023

OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems.
Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

Hierarchical Resource Partitioning on Modern GPUs: A Reinforcement Learning Approach.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

Copy-on-Pin: The Missing Piece for Correct Copy-on-Write.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Operational Data Analytics in practice: Experiences from design to deployment in production HPC environments.
Parallel Comput., 2022

Resiliency in numerical algorithm design for extreme scale simulations.
Int. J. High Perform. Comput. Appl., 2022

Accelerating HPC With Quantum Computing: It Is a Software Challenge Too.
Comput. Sci. Eng., 2022

An Emulation Layer for Dynamic Resources with MPI Sessions.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

Towards Dynamic Resource Management with MPI Sessions and PMIx.
Proceedings of the EuroMPI/USA'22: 29th European MPI Users' Group Meeting, Chattanooga, TN, USA, September 26, 2022

Exploiting Reduced Precision for GPU-based Time Series Mining.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

Resource-Constrained Optimizations For Synthetic Aperture Radar On-Board Image Processing.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022

Querying Distributed Sensor Streams in the Edge-to-Cloud Continuum.
Proceedings of the IEEE International Conference on Edge Computing and Communications, 2022

Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning.
Proceedings of the Architecture of Computing Systems - 35th International Conference, 2022

Energy Efficient Frequency Scaling on GPUs in Heterogeneous HPC Systems.
Proceedings of the Architecture of Computing Systems - 35th International Conference, 2022

2021
Guest Editorial: Special Issue on Computing Frontiers.
J. Signal Process. Syst., 2021

PredCom: A Predictive Approach to Collecting Approximated Communication Traces.
IEEE Trans. Parallel Distributed Syst., 2021

Quantum Algorithms for Solving Ordinary Differential Equations via Classical Integration Methods.
Quantum, 2021

Graph-based multi-core higher-order time integration of linear autonomous partial differential equations.
J. Comput. Sci., 2021

Understanding I/O Behavior in Scientific and Data-Intensive Computing (Dagstuhl Seminar 21332).
Dagstuhl Reports, 2021

virtio-mem: paravirtualized memory hot(un)plug.
Proceedings of the VEE '21: 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2021

Efficient LLVM-based dynamic binary translation.
Proceedings of the VEE '21: 17th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2021

A next-generation discontinuous galerkin fluid dynamics solver with application to high-resolution lung airflow simulations.
Proceedings of the International Conference for High Performance Computing, 2021

Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

On the Inevitability of Integrated HPC Systems and How they will Change HPC System Operations.
Proceedings of the HEART '21: 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2021

Living on the Edge: Efficient Handling of Large Scale Sensor Data.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Footprint-Based DIMM Hotplug.
IEEE Trans. Computers, 2020

QMPI: A next generation MPI profiling interface for modern HPC platforms.
Parallel Comput., 2020

EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Concurr. Comput. Pract. Exp., 2020

A survey of MPI usage in the US exascale computing project.
Concurr. Comput. Pract. Exp., 2020

Instrew: leveraging LLVM for high performance dynamic binary instrumentation.
Proceedings of the VEE '20: 16th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2020

Time Series Mining at Petascale Performance.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Characterizing HPC Performance Variation with Monitoring and Unsupervised Learning.
Proceedings of the High Performance Computing, 2020

Footprint-Aware Power Capping for Hybrid Memory Based Systems.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Pattern-Aware Staging for Hybrid Memory Systems.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Workshop 16: SNACS Scalable Networks for Advanced Computing Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Cache-Aware Matrix Polynomials.
Proceedings of the Computational Science - ICCS 2020, 2020

DCDB Wintermute: Enabling Online and Holistic Operational Data Analytics on HPC Systems.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

2019
The MPI_T events interface: An early evaluation and overview of the interface.
Parallel Comput., 2019

Pruners.
Int. J. High Perform. Comput. Appl., 2019

From facility to application sensor data: modular, continuous and holistic monitoring with DCDB.
Proceedings of the International Conference for High Performance Computing, 2019


Predicting faults in high performance computing systems: an in-depth survey of the state-of-the-practice.
Proceedings of the International Conference for High Performance Computing, 2019

QMPI: a next generation MPI profiling interface for modern HPC platforms.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Exploring High Bandwidth Memory for PET Image Reconstruction.
Proceedings of the Parallel Computing: Technology Trends, 2019

SAFIRE: Scalable and Accurate Fault Injection for Parallel Multithreaded Applications.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Power efficient job scheduling by predicting the impact of processor manufacturing variability.
Proceedings of the ACM International Conference on Supercomputing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.
Proceedings of the ACM International Conference on Supercomputing, 2019

Reducing False Node Failure Predictions in HPC.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
MemAxes: Visualization and Analytics for Characterizing Complex Memory Performance Behaviors.
IEEE Trans. Vis. Comput. Graph., 2018

FlipTracker: understanding natural error resilience in HPC applications.
Proceedings of the International Conference for High Performance Computing, 2018

Enabling callback-driven runtime introspection via MPI_T.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Analyzing Resource Trade-offs in Hardware Overprovisioned Supercomputers.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Interference between I/O and MPI Traffic on Fat-tree Networks.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Thread-local concurrency: a technique to handle data race detection at programming model abstraction.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Co-Scheduling in a Task-Based Programming Model.
Proceedings of the 3rd Workshop on Co-Scheduling of HPC Applications, 2018

Panel discussions: "Challenges to the scaling limits: How can we achieve sustainable power-performance improvements?".
Proceedings of the 2018 IEEE Symposium in Low-Power and High-Speed Chips, 2018

2017
ScrubJay: deriving knowledge from the disarray of HPC performance data.
Proceedings of the International Conference for High Performance Computing, 2017

REFINE: realistic fault injection via compiler-based instrumentation for accuracy, portability and speed.
Proceedings of the International Conference for High Performance Computing, 2017

Simulating Power Scheduling at Scale.
Proceedings of the 5th International Workshop on Energy Efficient Supercomputing, 2017

Noise Injection Techniques to Expose Subtle and Unintended Message Races.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

OpenMP Tools Interface: Synchronization Information for Data Race Detection.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Production Hardware Overprovisioning: Real-World Performance Optimization Using an Extensible Power-Aware Resource Management Framework.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Power Aware High Performance Computing: Challenges and Opportunities for Application and System Developers - Survey & Tutorial.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017

Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters.
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale, 2017

Flexible Data Aggregation for Performance Profiling.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2.
IEEE Trans. Parallel Distributed Syst., 2016

Ordering Traces Logically to Identify Lateness in Message Passing Programs.
IEEE Trans. Parallel Distributed Syst., 2016

Evaluating and extending user-level fault tolerance in MPI applications.
Int. J. High Perform. Comput. Appl., 2016

Exploring the MPI tool information interface: features and capabilities.
Int. J. High Perform. Comput. Appl., 2016

Development effort estimation in HPC.
Proceedings of the International Conference for High Performance Computing, 2016

Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing.
Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, 2016

VIPACT: A Visualization Interface for Analyzing Calling Context Trees.
Proceedings of the Third Workshop on Visual Performance Analysis, 2016

Pinpointing scale-dependent integer overflow bugs in large-scale parallel applications.
Proceedings of the International Conference for High Performance Computing, 2016

A machine learning framework for performance coverage analysis of proxy applications.
Proceedings of the International Conference for High Performance Computing, 2016

A Performance Model for Allocating the Parallelism in a Multigrid-in-Time Solver.
Proceedings of the 7th International Workshop on Performance Modeling, 2016

A Unified Platform for Exploring Power Management Strategies.
Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, 2016

Caliper: performance introspection for HPC software stacks.
Proceedings of the International Conference for High Performance Computing, 2016

Allowing MPI tools builders to forget about Fortran.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Testing Infrastructure for OpenMP Debugging Interface Implementations.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Structural Clustering: A New Approach to Support Performance Analysis at Scale.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

I/O Aware Power Shifting.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

MPMD Framework for Offloading Load Balance Computation.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Power Balancing in an Emulated Exascale Environment.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Systemwide Power Management with Argo.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

ARCHER: Effectively Spotting Data Races in Large OpenMP Applications.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Fast Multi-parameter Performance Modeling.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

IPAS: intelligent protection against silent output corruption in scientific applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

2015
Connecting Performance Analysis and Visualization (Dagstuhl Perspectives Workshop 14022).
Dagstuhl Manifestos, 2015

Debugging high-performance computing applications at massive scales.
Commun. ACM, 2015

A Run-Time System for Power-Constrained HPC Applications.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Clock delta compression for scalable order-replay of non-deterministic parallel applications.
Proceedings of the International Conference for High Performance Computing, 2015

Recovering logical structure from Charm++ event traces.
Proceedings of the International Conference for High Performance Computing, 2015

Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing.
Proceedings of the International Conference for High Performance Computing, 2015

Dynamic power sharing for higher job throughput.
Proceedings of the International Conference for High Performance Computing, 2015

Finding the limits of power-constrained application performance.
Proceedings of the International Conference for High Performance Computing, 2015

Decoupled load balancing.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Lessons Learned from Implementing OMPD: A Debugging Interface for OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Predicting Optimal Power Allocation for CPU and DRAM Domains.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

A Scalable Prescriptive Parallel Debugging Model.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Identifying the Culprits Behind Network Congestion.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Practical Resource Management in Power-Constrained, High Performance Computing.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

POW: System-wide Dynamic Reallocation of Limited Power in HPC.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Event-Action Mappings for Parallel Tools Infrastructures.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Distributed Monitoring and Management of Exascale Systems in the Argo Project.
Proceedings of the Distributed Applications and Interoperable Systems, 2015

An Approach to Selecting Thread + Process Mixes for Hybrid MPI + OpenMP Applications.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time.
IEEE Trans. Vis. Comput. Graph., 2014

Enabling fair pricing on high performance computer systems with node sharing.
Sci. Program., 2014

Connecting Performance Analysis and Visualization to Advance Extreme Scale Computing (Dagstuhl Perspectives Workshop 14022).
Dagstuhl Reports, 2014

State of the Art of Performance Visualization.
Proceedings of the 16th Eurographics Conference on Visualization, 2014

Towards providing low-overhead data race detection for large OpenMP applications.
Proceedings of the 2014 LLVM Compiler Infrastructure in HPC, 2014

Algebraic Multigrid on a Dragonfly Network: First Experiences on a Cray XC30.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation, 2014

Evaluating User-Level Fault Tolerance for MPI Applications.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Exploring the Capabilities of the New MPI_T Interface.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Extracting logical structure and identifying stragglers in parallel execution traces.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Accurate application progress analysis for large-scale parallel debugging.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Flux: A Next-Generation Resource Management Framework for Large HPC Centers.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on amazon EC2.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Modeling the Impact of Reduced Memory Bandwidth on HPC Applications.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Memory Usage Optimizations for Online Event Analysis.
Proceedings of the Solving Software Challenges for Exascale, 2014

2013
Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.
IEEE Trans. Parallel Distributed Syst., 2013

Characterizing and mitigating work time inflation in task parallel programs.
Sci. Program., 2013

MPI runtime error detection with MUST: Advances in deadlock detection.
Sci. Program., 2013

Parallelizing heavyweight debugging tools with mpiecho.
Parallel Comput., 2013

LIBI: A framework for bootstrapping extreme scale software systems.
Parallel Comput., 2013

A study of application-level recovery methods for transient network faults.
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Enabling fair pricing on HPC systems with node sharing.
Proceedings of the International Conference for High Performance Computing, 2013

Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset.
Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, 2013

Runtime MPI collective checking with tree-based overlay networks.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Performance Analysis Techniques for the Exascale Co-Design Process.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Systematic Reduction of Data Movement in Algebraic Multigrid Solvers.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Efficient and Scalable Retrieval Techniques for Global File Properties.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Exploring hardware overprovisioning in power-constrained, high performance computing.
Proceedings of the International Conference on Supercomputing, 2013

Intralayer Communication for Tree-Based Overlay Networks.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

A comparative study of high-performance computing on the cloud.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Alignment-Based Metrics for Trace Comparison.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.
IEEE Trans. Vis. Comput. Graph., 2012

What scientific applications can benefit from hardware transactional memory?
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Performance Modeling of Algebraic Multigrid on Blue Gene/Q: Lessons Learned.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Mapping applications with collectives over sub-communicators on torus networks.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Novel views of performance data to analyze large-scale adaptive applications.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

MPI Runtime Error Detection with MUST: Advanced Error Reports.
Proceedings of the Tools for High Performance Computing 2012, 2012

The myrmics memory allocator: hierarchical, message-passing allocation for global address spaces.
Proceedings of the International Symposium on Memory Management, 2012

Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Critical-Path Based Performance Analysis.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Quantifying the effectiveness of load balance algorithms.
Proceedings of the International Conference on Supercomputing, 2012

Fault resilience of the algebraic multi-grid solver.
Proceedings of the International Conference on Supercomputing, 2012

Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Modeling the Performance of an Algebraic Multigrid Cycle Using Hybrid MPI/OpenMP.
Proceedings of the 41st International Conference on Parallel Processing, 2012

2011
Checkpointing.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Formal analysis of MPI-based parallel programs.
Commun. ACM, 2011

Large scale debugging of parallel tasks with AutomaDeD.
Proceedings of the Conference on High Performance Computing Networking, 2011

Order Preserving Event Aggregation in TBONs.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Creating a Tool Set for Optimizing Topology-Aware Node Mappings.
Proceedings of the Tools for High Performance Computing 2011, 2011

Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Exploiting Data Similarity to Reduce Memory Footprints.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Challenges of Scaling Algebraic Multigrid Across Modern Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Modeling the performance of an algebraic multigrid cycle on HPC platforms.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Interpreting Performance Data across Intuitive Domains.
Proceedings of the International Conference on Parallel Processing, 2011

Practical performance prediction under Dynamic Voltage Frequency Scaling.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

Scalable memory registration for high performance networks using helper threads.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy Update.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Transforming MPI source code based on communication patterns.
Future Gener. Comput. Syst., 2010

On the Performance of an Algebraic Multigrid Solver on Multicore Clusters.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

A Scalable and Distributed Dynamic Formal Verifier for MPI Programs.
Proceedings of the Conference on High Performance Computing Networking, 2010

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale.
Proceedings of the Applied Parallel and Scientific Computing, 2010

Hybrid MPI/OpenMP power-aware computing.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Using focused regression for accurate time-constrained scaling of scientific applications.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Clustering performance data efficiently at massive scales.
Proceedings of the 24th International Conference on Supercomputing, 2010

Exploitation of Dynamic Communication Patterns through Static Analysis.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Comparing Scalability Prediction Strategies on an SMP of CMPs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

AutomaDeD: Automata-based debugging for dissimilar parallel tasks.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

10181 Executive Summary - Program Development for Extreme-Scale Computing.
Proceedings of the Program Development for Extreme-Scale Computing, 02.05. - 07.05.2010, 2010

10181 Abstracts Collection - Program Development for Extreme-Scale Computing.
Proceedings of the Program Development for Extreme-Scale Computing, 02.05. - 07.05.2010, 2010

Scaling Algebraic Multigrid Solvers: On the Road to Exascale.
Proceedings of the Competence in High Performance Computing 2010, 2010

2009
ScalaTrace: Scalable compression and replay of communication traces for high-performance computing.
J. Parallel Distributed Comput., 2009

Scalable temporal order analysis for large scale debugging.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

8th International Special Session on Current Trends in Numerical Simulation for Parallel Engineering Environments.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs.
Proceedings of the Tools for High Performance Computing 2009, 2009

PSMalloc: content based memory management for MPI applications.
Proceedings of the 10th workshop on MEmory performance, 2009

Machine learning based online performance prediction for runtime parallelization and task scheduling.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Adagio: making DVS practical for complex HPC applications.
Proceedings of the 23rd international conference on Supercomputing, 2009

A graph based approach for MPI deadlock detection.
Proceedings of the 23rd international conference on Supercomputing, 2009

2008
Efficient architectural design space exploration via predictive modeling.
ACM Trans. Archit. Code Optim., 2008

Open | SpeedShop: An open source infrastructure for parallel performance analysis.
Sci. Program., 2008

BlueGene/L applications: Parallelism On a Massive Scale.
Int. J. High Perform. Comput. Appl., 2008

Lessons learned at 208K: towards debugging millions of cores.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Scalable load-balance measurement for SPMD codes.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

7th International Special Session on Current Trends in Numerical Simulation for Parallel Engineering Environments: New Directions and Work-in-Progress (ParSim 2008).
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

On the Performance of Transparent MPI Piggyback Messages.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Preserving time in large-scale communication traces.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

A regression-based approach to scalability prediction.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Detecting Patterns in MPI Communication Traces.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Overcoming Scalability Challenges for Tool Daemon Launching.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Using MPI Communication Patterns to Guide Source Code Transformations.
Proceedings of the Computational Science, 2008

Topic 2: Performance Prediction and Evaluation.
Proceedings of the Euro-Par 2008, 2008

Prediction models for multi-dimensional power-performance optimization on many cores.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Dynamic Binary Instrumentation and Data Aggregation on Large Scale Systems.
Int. J. Parallel Program., 2007

Predicting parallel application performance via machine learning approaches.
Concurr. Comput. Pract. Exp., 2007

P<sup><i>N</i></sup>MPI tools: a whole lot greater than the sum of their parts.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Bounding energy consumption in large-scale MPI programs.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

6<sup>th</sup> International Special Session on Current Trends in Numerical Simulation for Parallel Engineering Environments <i>New Directions and Work-in-Progress</i> ParSim 2007.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Methods of inference and learning for performance modeling of parallel applications.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Benchmarking the Stack Trace Analysis Tool for BlueGene/L.
Proceedings of the Parallel Computing: Architectures, 2007

Scalable Compression and Replay of Communication Traces in Massively P arallel E nvironments.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Stack Trace Analysis for Large Scale Debugging.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Practical Differential Profiling.
Proceedings of the Euro-Par 2007, 2007

Identifying energy-efficient concurrency levels using machine learning.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Poster reception - Scalable compression and replay of communication traces in massively parallel environments.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Patterns in parallel programs: toward high-level understanding of large-scale traces.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

5<sup><i>th</i></sup> International Special Session on Current Trends in Numerical Simulation for Parallel Engineering Environments.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Improving distributed memory applications testing by message perturbation.
Proceedings of the 4th Workshop on Parallel and Distributed Systems: Testing, 2006

Dynamic program phase detection in distributed shared-memory multiprocessors.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A Flexible and Dynamic Infrastructure for MPI Tool Interoperability.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Exploring Unexpected Behavior in MPI.
Proceedings of the High Performance Computing and Communications, 2006

Efficiently exploring architectural design spaces via predictive modeling.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005
Scalable dynamic binary instrumentation for Blue Gene/L.
SIGARCH Comput. Archit. News, 2005

Simulation as a tool for optimizing memory accesses on NUMA machines.
Perform. Evaluation, 2005

Monitoring cache behavior on parallel SMP architectures and related programming tools.
Future Gener. Comput. Syst., 2005

4<sup>th</sup> International Special Session on: Current Trends in Numerical Simulation for Parallel Engineering Environments ParSim 2005.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Improving the computational intensity of unstructured mesh applications.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

DynTG: A Tool for Interactive, Dynamic Instrumentation.
Proceedings of the Computational Science, 2005

An Approach to Performance Prediction for Parallel Applications.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Extracting Critical Path Graphs from MPI Applications.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Owl: next generation system monitoring.
Proceedings of the Second Conference on Computing Frontiers, 2005

2004
SIMT/OMP: A Toolset to Study and Exploit Memory Locality of OpenMP Applications on NUMA Architectures.
Proceedings of the Shared Memory Parallel Programming with OpenMP, 2004

Implementation and Evaluation of a Scalable Application-Level Checkpoint-Recovery Scheme for MPI Programs.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Current Trends in Numerical Simulation for Parallel Engineering Environments. ParSim 2004.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Application-level checkpointing for shared memory programs.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

SimSnap: Fast-Forwarding via Native Execution and Application-Level Checkpointing.
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003
Pathways of Relevance: Exploring Inflows of Knowledge into Subunits of Multinational Corporations.
Organ. Sci., 2003

ARS: an adaptive runtime system for locality optimization.
Future Gener. Comput. Syst., 2003

SMiLE: an integrated, multi-paradigm software infrastructure for SCI-basedclusters.
Future Gener. Comput. Syst., 2003

Interactive Locality Optimization on NUMA Architectures.
Proceedings of the Proceedings ACM 2003 Symposium on Software Visualization, 2003

Identifying and Exploiting Spatial Regularity in Data Memory References.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Special Session of EuroPVM/MPI 2003: Current Trends in Numerical Simulation for Parallel Engineering Environments - ParSim 2003.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

A Framework for Portable Shared Memory Programming.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

CAD Grid: Corporate-Wide Resource Sharing for Parameter Studies.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

A Simulation Tool for Evaluating Shared Memory Systems.
Proceedings of the Proceedings 36th Annual Simulation Symposium (ANSS-36 2003), Orlando, Florida, USA, March 30, 2003

2002
Memory access behavior analysis of NUMA-based shared memory programs.
Sci. Program., 2002

A Comprehensive Electric Field Simulation Environment on Top of SCI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002

Current Trends in Numerical Simulation for Parallel Engineering Environments.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002

Notes on Nondeterminism in Message Passing Programs.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002

Performance Analysis for Teraflop Computers: A Distributed Automatic Approach.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Boosting the Performance of Electromagnetic Simulations on a PC-Cluster.
Proceedings of the 2002 International Conference on Parallel Computing in Electrical Engineering (PARELEC 2002), 2002

A proposal for a new hardware cache monitoring architecture.
Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002

Improving Data Locality Using Dynamic Page Migration Based on Memory Access Histograms.
Proceedings of the Computational Science - ICCS 2002, 2002

Using Semantic Information to Guide Efficient Parallel I/O on Clusters.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

SMiLE: An Integrated, Multi-Paradigm Software Infrastructure for SCI-Based Clusters.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

Overcoming the Problems Associated with the Existence of Too Many DSM APIs.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002

2001
Shared memory programming on NUMA-based clusters using a general and open hybrid hardware, software Approach.
PhD thesis, 2001

Parallel Volume Rendering based on Isosurface Extraction using Commodity Clusters.
Proceedings of the IASTED International Conference on Visualization, 2001

SCI-Based LINUX PC-Clusters as a Platform for Electromagnetic Field Calculations.
Proceedings of the Parallel Computing Technologies, 2001

Visualizing the Memory Access Behavior of Shared Memory Applications on NUMA Architectures.
Proceedings of the Computational Science - ICCS 2001, 2001

Meeting the Computational Demands of Nuclear Medical Imaging Using Commodity Clusters.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2000).
Proceedings of the Parallel and Distributed Processing, 2000

Using the SMiLE Monitoring Infrastructure to Detect and Lower the Inefficiency of Parallel Applications.
Proceedings of the High-Performance Computing and Networking, 8th International Conference, 2000

NEPHEW: Applying a Toolset for the Efficient Deployment of a Medical Image Application on SCI-Based Clusters.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Multilayer Online-Monitoring for Hybrid DSM Systems on Top of PC Clusters with a SMiLE.
Proceedings of the Computer Performance Evaluation: Modelling Techniques and Tools, 2000

Multithreaded Programming of PC Clusters.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
True Shared Memory Programming on SCI-Based Clusters.
Proceedings of the SCI: Scalable Coherent Interface, 1999

SCI-VM: A Flexible Base for Transparent Shared Memory Programming Models on Clusters of PCs.
Proceedings of the Parallel and Distributed Processing, 1999

Supporting Shared Memory and Message Passing on Clusters of PCs with a SMiLE.
Proceedings of the Network-Based Parallel Computing: Communication, 1999

Optimizing Data Locality for SCI-Based PC-Clusters with the SmiLE Monitoring Approach.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
SISCI-Pthreads, SMP-like programming on an SCI-cluster.
Proceedings of the High-Performance Computing and Networking, 1998

1997
Architectural Adaptation for Application-Specific Locality Optimization.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997


  Loading...