Bronis R. de Supinski

Proceedings of the 20th IEEE International Conference on e-Science, 2024

BLP: Block-Level Pipelining for GPUs.

[BibT_eX]

[DOI]

Xuewen Cui

Thomas Scogland

Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

2023

Performance on HPC Platforms Is Possible Without C++.

[BibT_eX]

[DOI]

Anshu Dubey

Tal Ben-Nun

Bradford L. Chamberlain

Damian W. I. Rouson

Comput. Sci. Eng., 2023

Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems.

[BibT_eX]

[DOI]

Konstantinos Parasyris

David Beckingsale

CoRR, 2023

LM4HPC: Towards Effective Language Model Application in High-Performance Computing.

[BibT_eX]

[DOI]

Le Chen

Pei-Hung Lin

Tristan Vanderbruggen

Murali Emani

Proceedings of the OpenMP: Advanced Task-Based, Device and Compiler Programming, 2023

2022

Data-driven global weather predictions at high resolutions.

[BibT_eX]

[DOI]

John A. Taylor

Pablo Rozas Larraondo

Int. J. High Perform. Comput. Appl., 2022

An analytical performance model of generalized hierarchical scheduling.

[BibT_eX]

[DOI]

Michela Taufer

Int. J. High Perform. Comput. Appl., 2022

Extending OpenMP to Support Automated Function Specialization Across Translation Units.

[BibT_eX]

[DOI]

Proceedings of the OpenMP in a Modern World: From Multi-device Support to Meta Programming, 2022

Scalable Composition and Analysis Techniques for Massive Scientific Workflows.

[BibT_eX]

[DOI]

Brian Van Essen

Jonathan E. Allen

Felice C. Lightstone

Proceedings of the 18th IEEE International Conference on e-Science, 2022

2021

Mitigating Inter-Job Interference via Process-Level Quality-of-Service.

[BibT_eX]

[DOI]

Lee Savoie

Nikhil Jain

ACM Trans. Parallel Comput., 2021

Special Issue Introduction: The Gordon Bell Special Prize for HPC-Based COVID-19 Research Finalists.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

Extending OpenMP for Machine Learning-Driven Adaptation.

[BibT_eX]

[DOI]

Anjia Wang

David Beckingsale

Proceedings of the Accelerator Programming Using Directives - 8th International Workshop, 2021

Beyond Explicit Transfers: Shared and Managed Memory in OpenMP.

[BibT_eX]

[DOI]

Brandon Neth

Alejandro Duran

Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

Inter-loop optimization in RAJA using loop chains.

[BibT_eX]

[DOI]

Brandon Neth

Michelle Mills Strout

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Monitoring Large Scale Supercomputers: A Case Study with the Lassen Supercomputer.

[BibT_eX]

[DOI]

Nathan Besaw

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

Unified Sequential Optimization Directives in OpenMP.

[BibT_eX]

[DOI]

Brandon Neth

Michelle Mills Strout

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019

Statistical and machine learning models for optimizing energy in parallel applications.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Preparation and optimization of a diverse workload for a large-scale heterogeneous system.

[BibT_eX]

[DOI]

Ian Karlin

Yoonho Park

Guillaume Thomas-Collignon

Sara Kokkila Schumacher

Proceedings of the International Conference for High Performance Computing, 2019

Ompparser: A Standalone and Unified OpenMP Parser.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

A Framework for Enabling OpenMP Autotuning.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Making OpenMP Ready for C++ Executors.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Extending OpenMP Metadirective Semantics for Runtime Adaptation.

[BibT_eX]

[DOI]

Anjia Wang

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

2018

The Ongoing Evolution of OpenMP.

[BibT_eX]

[DOI]

Proc. IEEE, 2018

Big data and extreme-scale computing.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

The design, deployment, and evaluation of the CORAL pre-exascale systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Energy efficiency modeling of parallel applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Extending OpenMP to Facilitate Loop Optimization.

[BibT_eX]

[DOI]

Ian J. Bertolacci

Michelle Mills Strout

Eddie C. Davis

Catherine Olschanowsky

Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

A Study of Network Quality of Service in Many-Core MPI Applications.

[BibT_eX]

[DOI]

Lee Savoie

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017

ALEA: A Fine-Grained Energy Profiling Tool.

[BibT_eX]

[DOI]

Lev Mukhanov

Pavlos Petoumenos

Zheng Wang

Konstantinos Parasyris

Hugh Leather

ACM Trans. Archit. Code Optim., 2017

SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads.

[BibT_eX]

[DOI]

Hans Vandierendonck

Peter Thoman

Thomas Fahringer

ACM Trans. Archit. Code Optim., 2017

A survey on software methods to improve the energy efficiency of parallel computing.

[BibT_eX]

[DOI]

Chao Jin

Int. J. High Perform. Comput. Appl., 2017

Application Modernization for the Exascale Era.

[BibT_eX]

[DOI]

J. Robert Neely

Charles H. Still

Comput. Sci. Eng., 2017

Application Modernization at LLNL and the Sierra Center of Excellence.

[BibT_eX]

[DOI]

J. Robert Neely

Comput. Sci. Eng., 2017

A Bottleneck-Centric Tuning Policy for Optimizing Energy in Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing is Everywhere, 2017

Custom Data Mapping for Composable Data Management.

[BibT_eX]

[DOI]

Tom Scogland

Chris Earl

Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

Directive-Based Partitioning and Pipelining for Graphics Processing Units.

[BibT_eX]

[DOI]

Xuewen Cui

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2.

[BibT_eX]

[DOI]

Aniruddha Marathe

Rachel Harris

IEEE Trans. Parallel Distributed Syst., 2016

Evaluating and extending user-level fault tolerance in MPI applications.

[BibT_eX]

[DOI]

Howard Pritchard

Int. J. High Perform. Comput. Appl., 2016

Economic Viability of Hardware Overprovisioning in Power-Constrained High Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 4th International Workshop on Energy Efficient Supercomputing, 2016

Runtime Correctness Analysis of MPI-3 Nonblocking Collectives.

[BibT_eX]

[DOI]

Matthias Weber

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Approaches for Task Affinity in OpenMP.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

A Case for Extending Task Dependencies.

[BibT_eX]

[DOI]

Tom Scogland

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Early Experiences Porting Three Applications to OpenMP 4.5.

[BibT_eX]

[DOI]

Gheorghe-Teodor Bercea

Carlo Bertolli

Alexandre E. Eichenberger

Erik W. Draeger

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Transactional Memory for Algebraic Multigrid Smoothers.

[BibT_eX]

[DOI]

Ulrike Meier Yang

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

I/O Aware Power Shifting.

[BibT_eX]

[DOI]

Lee Savoie

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

MPMD Framework for Offloading Load Balance Computation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Directive-Based Pipelining Extension for OpenMP.

[BibT_eX]

[DOI]

Xuewen Cui

Wu-Chun Feng

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

A scalable and composable map-reduce system.

[BibT_eX]

[DOI]

Mahwish Arif

Hans Vandierendonck

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015

CoreTSAR: Core Task-Size Adapting Runtime.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Diagnosis of Performance Faults in LargeScale MPI Applications via Probabilistic Progress-Dependence Inference.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Debugging high-performance computing applications at massive scales.

[BibT_eX]

[DOI]

Commun. ACM, 2015

A Run-Time System for Power-Constrained HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

The Spack package manager: bringing order to HPC software chaos.

[BibT_eX]

[DOI]

Scott Futral

Proceedings of the International Conference for High Performance Computing, 2015

Decoupled load balancing.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Supporting multiple accelerators in high-level programming models.

[BibT_eX]

[DOI]

Pei-Hung Lin

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

HpMC: An Energy-aware Management System of Multi-level Memory Architectures.

[BibT_eX]

[DOI]

Gabriel H. Loh

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?

[BibT_eX]

[DOI]

Milan Radulovic

Darko Zivanovic

Daniel Ruiz

Petar Radojkovic

Eduard Ayguadé

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Supporting Indirect Data Mapping in OpenMP.

[BibT_eX]

[DOI]

Jeff Keasler

Rich Hornung

Hal Finkel

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Enabling Region Merging Optimizations in OpenMP.

[BibT_eX]

[DOI]

Jeff Keasler

Rich Hornung

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Towards Task-Parallel Reductions in OpenMP.

[BibT_eX]

[DOI]

Alexandre E. Eichenberger

Stephen Olivier

Kelvin Li

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Practical Resource Management in Power-Constrained, High Performance Computing.

[BibT_eX]

[DOI]

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Event-Action Mappings for Parallel Tools Infrastructures.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

ALEA: Fine-Grain Energy Profiling with Basic Block Sampling.

[BibT_eX]

[DOI]

Lev Mukhanov

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Detailed Modeling and Evaluation of a Scalable Multilevel Checkpointing System.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

Evaluating User-Level Fault Tolerance for MPI Applications.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Towards Transactional Memory for OpenMP.

[BibT_eX]

[DOI]

Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

On the Algorithmic Aspects of Using OpenMP Synchronization Mechanisms: The Effects of Transactional Memory.

[BibT_eX]

[DOI]

Lori A. Diachin

Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

FMI: Fault Tolerant Messaging Interface for Fast and Transparent Recovery.

[BibT_eX]

[DOI]

Naoya Maruyama

Satoshi Matsuoka

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Load balancing n-body simulations with highly non-uniform density.

[BibT_eX]

[DOI]

Tom Arsenlis

Proceedings of the 2014 International Conference on Supercomputing, 2014

MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Adaptive Configuration Selection for Power-Constrained Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Exploiting redundancy for cost-effective, time-constrained execution of HPC applications on amazon EC2.

[BibT_eX]

[DOI]

Aniruddha Marathe

Rachel Harris

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Memory Usage Optimizations for Online Event Analysis.

[BibT_eX]

[DOI]

Proceedings of the Solving Software Challenges for Exascale, 2014

A User-Level InfiniBand-Based File System and Checkpoint Strategy for Burst Buffers.

[BibT_eX]

[DOI]

Naoya Maruyama

Satoshi Matsuoka

Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013

Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2013

Characterizing and mitigating work time inflation in task parallel programs.

[BibT_eX]

[DOI]

Stephen L. Olivier

Jan F. Prins

Sci. Program., 2013

McrEngine: A scalable checkpointing system using data-aware aggregation and compression.

[BibT_eX]

[DOI]

Rudolf Eigenmann

Sci. Program., 2013

MPI runtime error detection with MUST: Advances in deadlock detection.

[BibT_eX]

[DOI]

Sci. Program., 2013

Parallelizing heavyweight debugging tools with mpiecho.

[BibT_eX]

[DOI]

Parallel Comput., 2013

LIBI: A framework for bootstrapping extreme scale software systems.

[BibT_eX]

[DOI]

Parallel Comput., 2013

Trellis: Portability across architectures with a high-level framework.

[BibT_eX]

[DOI]

Lukasz G. Szafaryn

Kevin Skadron

J. Parallel Distributed Comput., 2013

Distributed wait state tracking for runtime MPI deadlock detection.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Runtime MPI collective checking with tree-based overlay networks.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Early Experiences with the OpenMP Accelerator Model.

[BibT_eX]

[DOI]

Barbara M. Chapman

Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

HPPAC Introduction.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Efficient and Scalable Retrieval Techniques for Global File Properties.

[BibT_eX]

[DOI]

Michael J. Brim

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Exploring hardware overprovisioning in power-constrained, high performance computing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Automatically adapting programs for mixed-precision floating-point computation.

[BibT_eX]

[DOI]

Michael O. Lam

Jeffrey K. Hollingsworth

Proceedings of the International Conference on Supercomputing, 2013

Massively parallel loading.

[BibT_eX]

[DOI]

Felix Wolf

Proceedings of the International Conference on Supercomputing, 2013

Intralayer Communication for Tree-Based Overlay Networks.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

A comparative study of high-performance computing on the cloud.

[BibT_eX]

[DOI]

Aniruddha Marathe

Rachel Harris

Xin Yuan

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Alignment-Based Metrics for Trace Comparison.

[BibT_eX]

[DOI]

Matthias Weber

Holger Brunst

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Topic 1: Support Tools and Environments - (Introduction).

[BibT_eX]

[DOI]

Bettina Krammer

Karl Fürlinger

Jesús Labarta

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Optimizing power allocation to CPU and memory subsystems in overprovisioned HPC systems.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012

Critical path-based thread placement for NUMA systems.

[BibT_eX]

[DOI]

Chun-Yi Su

Matthew Grove

SIGMETRICS Perform. Evaluation Rev., 2012

Design and modeling of a non-blocking checkpointing system.

[BibT_eX]

[DOI]

Satoshi Matsuoka

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation.

[BibT_eX]

[DOI]

Michael O. Lam

Jeffrey K. Hollingsworth

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation.

[BibT_eX]

[DOI]

Michael O. Lam

Jeffrey K. Hollingsworth

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications.

[BibT_eX]

[DOI]

Vivek Kale

Torsten Hoefler

William D. Gropp

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Poster: Evaluation Topology Mapping via Graph Partitioning.

[BibT_eX]

[DOI]

Anshu Arya

Laxmikant V. Kalé

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Evaluating Topology Mapping via Graph Partitioning.

[BibT_eX]

[DOI]

Anshu Arya

Laxmikant V. Kalé

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

MPI Runtime Error Detection with MUST: Advanced Error Reports.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2012, 2012

A Case for Including Transactions in OpenMP II: Hardware Transactional Memory.

[BibT_eX]

[DOI]

Amy Wang

Wang Chen

Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

The myrmics memory allocator: hierarchical, message-passing allocation for global address spaces.

[BibT_eX]

[DOI]

Spyros Lyberis

Polyvios Pratikakis

Proceedings of the International Symposium on Memory Management, 2012

HPPAC Introduction.

[BibT_eX]

[DOI]

Roberto Gioiosa

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Heterogeneous Task Scheduling for Accelerated OpenMP.

[BibT_eX]

[DOI]

Thomas Scogland

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Beyond DVFS: A First Look at Performance under a Hardware-Enforced Power Bound.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Holistic Debugging of MPI Derived Datatypes.

[BibT_eX]

[DOI]

Andreas Knüpfer

Krishna Chaitanya Kandalla

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers.

[BibT_eX]

[DOI]

Dhabaleswar K. Panda

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Critical-Path Based Performance Analysis.

[BibT_eX]

[DOI]

David Böhme

Felix Wolf

Markus Geimer

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Model-based, memory-centric performance and power optimization on NUMA multiprocessors.

[BibT_eX]

[DOI]

Chun-Yi Su

Edgar A. León

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Quantifying the effectiveness of load balance algorithms.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

Integrated in-system storage architecture for high performance computing.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers, 2012

Fault resilience of the algebraic multi-grid solver.

[BibT_eX]

[DOI]

Marc Casas-Guix

Karthikeyan Sankaralingam

Proceedings of the International Conference on Supercomputing, 2012

Mechanisms and Evaluation of Cross-Layer Fault-Tolerance for Supercomputing.

[BibT_eX]

[DOI]

Chen-Han Ho

Marc de Kruijf

Proceedings of the 41st International Conference on Parallel Processing, 2012

Asynchronous checkpoint migration with MRNet in the Scalable Checkpoint / Restart Library.

[BibT_eX]

[DOI]

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2012

Automatic fault characterization via abnormality-enhanced classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

Probabilistic diagnosis of performance faults in large-scale parallel applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

The scalable process topology interface of MPI 2.2.

[BibT_eX]

[DOI]

Torsten Hoefler

Rolf Rabenseifner

Hubert Ritzdorf

Rajeev Thakur

Jesper Larsson Träff

Concurr. Comput. Pract. Exp., 2011

Formal analysis of MPI-based parallel programs.

[BibT_eX]

[DOI]

Ganesh Gopalakrishnan

Commun. ACM, 2011

Large scale debugging of parallel tasks with AutomaDeD.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Exascale Algorithms for Generalized MPI_Comm_split.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Order Preserving Event Aggregation in TBONs.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

OpenMP for Accelerators.

[BibT_eX]

[DOI]

James C. Beyer

Eric J. Stotzer

Alistair Hart

Proceedings of the OpenMP in the Petascale Era - 7th International Workshop on OpenMP, 2011

Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs.

[BibT_eX]

[DOI]

Zoltán Szebenyi

Felix Wolf

Brian J. N. Wylie

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Exploiting Data Similarity to Reduce Memory Footprints.

[BibT_eX]

[DOI]

Susmit Biswas

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Practical performance prediction under Dynamic Voltage Frequency Scaling.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

Scalable memory registration for high performance networks using helper threads.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

Large Scale Verification of MPI Programs Using Lamport Clocks with Lazy Update.

[BibT_eX]

[DOI]

Anh Vo

Ganesh Gopalakrishnan

Robert M. Kirby

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

Transforming MPI source code based on communication patterns.

[BibT_eX]

[DOI]

Robert Preissl

Dieter Kranzlmüller

Future Gener. Comput. Syst., 2010

A Scalable and Distributed Dynamic Formal Verifier for MPI Programs.

[BibT_eX]

[DOI]

Anh Vo

Sriram Aananthakrishnan

Ganesh Gopalakrishnan

Proceedings of the Conference on High Performance Computing Networking, 2010

Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Efficient MPI Support for Advanced Hybrid Programming Models.

[BibT_eX]

[DOI]

Torsten Hoefler

Brian Barrett

Andrew Lumsdaine

Proceedings of the Recent Advances in the Message Passing Interface, 2010

ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale.

[BibT_eX]

[DOI]

Xing Wu

Proceedings of the Applied Parallel and Scientific Computing, 2010

Towards an Error Model for OpenMP.

[BibT_eX]

[DOI]

Andrey Churbanov

Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A Case for Including Transactions in OpenMP.

[BibT_eX]

[DOI]

Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A ROSE-Based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries.

[BibT_eX]

[DOI]

Thomas Panas

Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

A Proposal for User-Defined Reductions in OpenMP.

[BibT_eX]

[DOI]

Alejandro Duran

Roger Ferrer

Michael Klemm

Eduard Ayguadé

Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Hybrid MPI/OpenMP power-aware computing.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Using focused regression for accurate time-constrained scaling of scientific applications.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Clustering performance data efficiently at massive scales.

[BibT_eX]

[DOI]

Robert J. Fowler

Daniel A. Reed

Proceedings of the 24th International Conference on Supercomputing, 2010

Exploitation of Dynamic Communication Patterns through Static Analysis.

[BibT_eX]

[DOI]

Robert Preissl

Proceedings of the 39th International Conference on Parallel Processing, 2010

Comparing Scalability Prediction Strategies on an SMP of CMPs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

AutomaDeD: Automata-based debugging for dissimilar parallel tasks.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Minimizing MPI Resource Contention in Multithreaded Multicore Environments.

[BibT_eX]

[DOI]

Rajeev Thakur

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

2009

ScalaTrace: Scalable compression and replay of communication traces for high-performance computing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

CLOMP: Accurately Characterizing OpenMP Application Overheads.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2009

Scalable temporal order analysis for large scale debugging.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

MUST: A Scalable Approach to Runtime Error Detection in MPI Programs.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2009, 2009

PSMalloc: content based memory management for MPI applications.

[BibT_eX]

[DOI]

Proceedings of the 10th workshop on MEmory performance, 2009

Machine learning based online performance prediction for runtime parallelization and task scheduling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

Adagio: making DVS practical for complex HPC applications.

[BibT_eX]

[DOI]

Vincent W. Freeh

Tyler K. Bletsch

Proceedings of the 23rd international conference on Supercomputing, 2009

A graph based approach for MPI deadlock detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd international conference on Supercomputing, 2009

2008

Efficient architectural design space exploration via predictive modeling.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2008

BlueGene/L applications: Parallelism On a Massive Scale.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2008

Lessons learned at 208K: towards debugging millions of cores.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Scalable load-balance measurement for SPMD codes.

[BibT_eX]

[DOI]

Robert J. Fowler

Daniel A. Reed

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

On the Performance of Transparent MPI Piggyback Messages.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Preserving time in large-scale communication traces.

[BibT_eX]

[DOI]

Prasun Ratn

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Soft error vulnerability of iterative linear algebra methods.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

A regression-based approach to scalability prediction.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Detecting Patterns in MPI Communication Traces.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Overcoming Scalability Challenges for Tool Daemon Launching.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Using MPI Communication Patterns to Guide Source Code Transformations.

[BibT_eX]

[DOI]

Robert Preissl

Dieter Kranzlmüller

Proceedings of the Computational Science, 2008

Prediction models for multi-dimensional power-performance optimization on many cores.

[BibT_eX]

[DOI]

Matthew Curtis-Maury

Ankur Shah

Filip Blagojevic

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies.

[BibT_eX]

[DOI]

Andy Yoo

ACM Trans. Program. Lang. Syst., 2007

Dynamic Binary Instrumentation and Data Aggregation on Large Scale Systems.

[BibT_eX]

[DOI]

Steven Y. Ko

Int. J. Parallel Program., 2007

Complete Formal Specification of the OpenMP Memory Model.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2007

Predicting parallel application performance via machine learning approaches.

[BibT_eX]

[DOI]

Karan Singh

Rich Caruana

Concurr. Comput. Pract. Exp., 2007

P<sup><i>N</i></sup>MPI tools: a whole lot greater than the sum of their parts.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Bounding energy consumption in large-scale MPI programs.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Methods of inference and learning for performance modeling of parallel applications.

[BibT_eX]

[DOI]

Benjamin C. Lee

David M. Brooks

Karan Singh

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Benchmarking the Stack Trace Analysis Tool for BlueGene/L.

[BibT_eX]

Proceedings of the Parallel Computing: Architectures, 2007

Scalable Compression and Replay of Communication Traces in Massively P arallel E nvironments.

[BibT_eX]

[DOI]

Michael Noeth

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Stack Trace Analysis for Large Scale Debugging.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Pynamic: the Python Dynamic Benchmark.

[BibT_eX]

[DOI]

Patrick Miller

Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

Practical Differential Profiling.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2007, 2007

Identifying energy-efficient concurrency levels using machine learning.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

Analysis of cache-coherence bottlenecks with hybrid hardware/software techniques.

[BibT_eX]

[DOI]

Jaydeep Marathe

ACM Trans. Archit. Code Optim., 2006

Poster reception - Scalable compression and replay of communication traces in massively parallel environments.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Gordon Bell finalists I - Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform.

[BibT_eX]

[DOI]

François Gygi

Erik W. Draeger

Christoph W. Ueberhuber

Juergen Lorenz

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Poster reception - Patterns in parallel programs: toward high-level understanding of large-scale traces.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Formal Specification of the OpenMP Memory Model.

[BibT_eX]

[DOI]

Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

Improving distributed memory applications testing by message perturbation.

[BibT_eX]

[DOI]

Richard W. Vuduc

Andreas Sæbjørnsen

Proceedings of the 4th Workshop on Parallel and Distributed Systems: Testing, 2006

Dynamic program phase detection in distributed shared-memory multiprocessors.

[BibT_eX]

[DOI]

José F. Martínez

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A Flexible and Dynamic Infrastructure for MPI Tool Interoperability.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Exploring Unexpected Behavior in MPI.

[BibT_eX]

[DOI]

Dieter Kranzlmüller

Proceedings of the High Performance Computing and Communications, 2006

Topic 1: Support Tools and Environments.

[BibT_eX]

[DOI]

Matthias Brehm

Luiz De Rose

Tomàs Margalef

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Toward Enhancing OpenMP's Work-Sharing Directives.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Efficiently exploring architectural design spaces via predictive modeling.

[BibT_eX]

[DOI]

Rich Caruana

Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2005

Scalable dynamic binary instrumentation for Blue Gene/L.

[BibT_eX]

[DOI]

Andrew Bernat

Steven Y. Ko

SIGARCH Comput. Archit. News, 2005

Evaluating high-performance computers.

[BibT_eX]

[DOI]

Jeffrey S. Vetter

Lynn Kissel

John May

Sheila Vaidya

Concurr. Pract. Exp., 2005

Large-Scale First-Principles Molecular Dynamics simulations on the BlueGene/L Platform using the Qbox code.

[BibT_eX]

[DOI]

Christoph W. Ueberhuber

Stefan Kral

John A. Gunnels

James C. Sexton

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Tera-Scalable Algorithms for Variable-Density Elliptic Hydrodynamics with Spectral Accuracy.

[BibT_eX]

[DOI]

Robert K. Yates

Michael L. Welcome

Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

The OpenMP Memory Model.

[BibT_eX]

[DOI]

Jay P. Hoeflinger

Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Improving the computational intensity of unstructured mesh applications.

[BibT_eX]

[DOI]

Brian S. White

Brian Miller

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

A hybrid hardware/software approach to efficiently determine cache coherence Bottlenecks.

[BibT_eX]

[DOI]

Jaydeep Marathe

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Scaling physics and material science applications on a massively parallel Blue Gene/L system.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

An Approach to Performance Prediction for Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2003

A C++ Infrastructure for Automatic Introduction and Translation of OpenMP Directives.

[BibT_eX]

[DOI]

Markus Schordan

Qing Yi

Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

DMPL: An OpenMP DLL Debugging Interface.

[BibT_eX]

[DOI]

James Cownie

John Del Signore Jr.

Karen H. Warren

Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Identifying and Exploiting Spatial Regularity in Data Memory References.

[BibT_eX]

[DOI]

Tushar Mohan

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Semantic-Driven Parallelization of Loops Operating on User-Defined Containers.

[BibT_eX]

[DOI]

Markus Schordan

Qing Yi

Proceedings of the Languages and Compilers for Parallel Computing, 2003

METRIC: Tracking Down Inefficiencies in the Memory Hierarchy via Binary Rewriting.

[BibT_eX]

[DOI]

Jaydeep Marathe

Tushar Mohan

Andy Yoo

Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

2002

A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

[BibT_eX]

[DOI]

Nicholas T. Karonis

Ian T. Foster

William Gropp

Ewing L. Lusk

CoRR, 2002

2000

Delta coherence protocols.

[BibT_eX]

[DOI]

Craig Williams

Paul F. Reynolds Jr.

IEEE Concurr., 2000

Dynamic Software Testing of MPI Applications with Umpire.

[BibT_eX]

[DOI]

Jeffrey S. Vetter

Proceedings of the Proceedings Supercomputing 2000, 2000

Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance.

[BibT_eX]

[DOI]

Nicholas T. Karonis

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

1999

Benchmarking Pthreads Performance.

[BibT_eX]

John May

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Experience with Mixed MPI/Threaded Programming Models.

[BibT_eX]

John M. May

Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

Accurately Measuring MPI Broadcasts in a Computational Grid.

[BibT_eX]

[DOI]