2024

F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems.

[DOI]

Francesco Antici

,

Andrea Bartolini

,

,

Zeynep Kiziltan

,

Dataset, June, 2024

Configurable Non-uniform All-to-all Algorithms.

[DOI]

,

,

,

CoRR, 2024

Benchmarking in the Datacenter (BID): Expanding to the Cloud.

[DOI]

,

Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

SPMD IR: Unifying SPMD and Multi-value IR Showcased for Static Verification of Collectives.

[DOI]

,

,

,

Matthias S. Müller

Proceedings of the Recent Advances in the Message Passing Interface, 2024

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.

[DOI]

,

,

Daniele De Sensi

,

,

,

,

,

Marek Konieczny

,

Kartik Lakhotia

,

,

,

Fabrizio Petrini

,

Torsten Hoefler

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Automatic Parallelization and OpenMP Offloading of Fortran Array Notation.

[DOI]

,

,

,

Johannes Doerfert

Proceedings of the Advancing OpenMP for Future Accelerators, 2024

Evaluation of Vectorization Methods on Arm SVE Using the Exo Language.

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2024

Retargeting and Respecializing GPU Workloads for Performance Portability.

[DOI]

,

Oleksandr Zinenko

,

,

,

William S. Moses

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023

At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.

[DOI]

,

,

,

,

,

,

,

Miquel Pericàs

,

,

,

Aleksandr Drozd

,

Satoshi Matsuoka

ACM Trans. Archit. Code Optim., December, 2023

Myths and legends in high-performance computing.

[DOI]

Satoshi Matsuoka

,

,

,

Aleksandr Drozd

,

Torsten Hoefler

Int. J. High Perform. Comput. Appl., July, 2023

Towards Collaborative Continuous Benchmarking for HPC.

[DOI]

,

,

,

,

,

Stephanie Brink

,

,

,

,

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Augmenting ML-based Predictive Modelling with NLP to Forecast a Job's Power Consumption.

[DOI]

Francesco Antici

,

,

,

Zeynep Kiziltan

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs.

[DOI]

William S. Moses

,

,

,

,

Johannes Doerfert

,

Oleksandr Zinenko

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

2022

Preparing for the Future - Rethinking Proxy Applications.

[DOI]

Satoshi Matsuoka

,

,

,

Aleksandr Drozd

,

Andrew A. Chien

,

,

Jeffrey S. Vetter

,

Comput. Sci. Eng., 2022

Preparing for the Future - Rethinking Proxy Apps.

[DOI]

Satoshi Matsuoka

,

,

,

Aleksandr Drozd

,

,

Andrew A. Chien

,

Jeffrey S. Vetter

,

CoRR, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.

[DOI]

,

,

,

,

,

,

,

Miquel Pericàs

,

,

,

Aleksandr Drozd

,

Satoshi Matsuoka

CoRR, 2022

Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning.

[DOI]

Truong Thao Nguyen

,

François Trahay

,

,

Aleksandr Drozd

,

,

,

,

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[DOI]

CoRR, 2021

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[DOI]

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?

[DOI]

,

,

Aleksandr Drozd

,

,

,

,

,

Daichi Mukunoki

,

,

,

Satoshi Matsuoka

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

A64FX - Your Compiler You Must Decide!

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

CoRR, 2020

White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.

[DOI]

Roman Iakymchuk

,

Daichi Mukunoki

,

,

Fabienne Jézéquel

,

Toshiyuki Imamura

,

Norihisa Fujita

,

,

,

,

,

Kai Torben Ohlhus

,

,

,

,

,

,

,

CoRR, 2020

Scaling distributed deep learning workloads beyond the memory capacity with KARMA.

[DOI]

,

,

Truong Thao Nguyen

,

Aleksandr Drozd

,

,

,

,

Satoshi Matsuoka

Proceedings of the International Conference for High Performance Computing, 2020

Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning.

[DOI]

,

,

,

,

,

,

Franck Cappello

,

Kathryn M. Mohror

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019

HyperX topology: first at-scale implementation and comparison to the fat-tree.

[DOI]

,

Satoshi Matsuoka

,

,

,

,

,

Shin'ichi Miura

,

,

Dennis Lee Floyd

,

Proceedings of the International Conference for High Performance Computing, 2019

Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?

[DOI]

,

Kazuaki Matsumura

,

,

,

,

Toshiki Tsuchikawa

,

,

,

Satoshi Matsuoka

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?

[DOI]

,

Satoshi Matsuoka

,

,

,

,

,

Shin'ichi Miura

,

,

Dennis Lee Floyd

,

Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

2018

Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope.

[DOI]

,

,

Abhinav Bhatele

,

,

,

Valerio Pascucci

,

Peer-Timo Bremer

Comput. Graph. Forum, 2018

Mitigating inter-job interference using adaptive flow-aware routing.

[DOI]

,

Clara E. Cromey

,

David K. Lowenthal

,

,

,

Jayaraman J. Thiagarajan

,

Abhinav Bhatele

Proceedings of the International Conference for High Performance Computing, 2018

2017

Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks.

[DOI]

PhD thesis, 2017

Toward reliable validation of HPC network simulation models.

[DOI]

,

,

,

,

,

Jianping Kelvin Li

,

Abhinav Bhatele

,

Christopher D. Carothers

,

,

Proceedings of the 2017 Winter Simulation Conference, 2017

Preliminary Performance Analysis of Multi-rail Fat-tree Networks.

[DOI]

,

,

,

,

Abhinav Bhatele

,

Christopher D. Carothers

,

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

A scalable framework for the global offline community land model ensemble simulation.

[DOI]

,

,

,

,

Daniel M. Ricciuto

Int. J. Comput. Sci. Eng., 2016

Scheduling-aware routing for supercomputers.

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015

Hardware-Centric Analysis of Network Performance for MPI Applications.

[DOI]

,

,

Satoshi Matsuoka

Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

2014

Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the International Conference for High Performance Computing, 2014

Tracing Data Movements within MPI Collectives.

[DOI]

,

,

Satoshi Matsuoka

Proceedings of the 21st European MPI Users' Group Meeting, 2014

2012

Runtime Tracing of the Community Earth System Model: Feasibility Study and Benefits.

[DOI]

,

Proceedings of the International Conference on Computational Science, 2012

2011

Deadlock-Free Oblivious Routing for Arbitrary Topologies.

[DOI]

,

Torsten Hoefler

,

Wolfgang E. Nagel

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011