2024
F-DATA: A Fugaku Workload Dataset for Job-centric Predictive Modelling in HPC Systems.
Dataset, June, 2024
Configurable Non-uniform All-to-all Algorithms.
CoRR, 2024
Benchmarking in the Datacenter (BID): Expanding to the Cloud.
Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024
SPMD IR: Unifying SPMD and Multi-value IR Showcased for Static Verification of Collectives.
Proceedings of the Recent Advances in the Message Passing Interface, 2024
A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
Automatic Parallelization and OpenMP Offloading of Fortran Array Notation.
Proceedings of the Advancing OpenMP for Future Accelerators, 2024
Evaluation of Vectorization Methods on Arm SVE Using the Exo Language.
Proceedings of the IEEE International Conference on Cluster Computing, 2024
Retargeting and Respecializing GPU Workloads for Performance Portability.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024
2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
,
,
,
,
,
,
,
,
,
,
,
ACM Trans. Archit. Code Optim., December, 2023
Myths and legends in high-performance computing.
Int. J. High Perform. Comput. Appl., July, 2023
Towards Collaborative Continuous Benchmarking for HPC.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Augmenting ML-based Predictive Modelling with NLP to Forecast a Job's Power Consumption.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
2022
Preparing for the Future - Rethinking Proxy Applications.
Comput. Sci. Eng., 2022
Preparing for the Future - Rethinking Proxy Apps.
CoRR, 2022
At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
2021
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.
IEEE Trans. Parallel Distributed Syst., 2021
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2021
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
,
,
,
,
,
,
,
,
,
,
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
A64FX - Your Compiler You Must Decide!
Proceedings of the IEEE International Conference on Cluster Computing, 2021
2020
High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.
CoRR, 2020
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2020
Scaling distributed deep learning workloads beyond the memory capacity with KARMA.
Proceedings of the International Conference for High Performance Computing, 2020
Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
2019
HyperX topology: first at-scale implementation and comparison to the fat-tree.
Proceedings of the International Conference for High Performance Computing, 2019
Double-Precision FPUs in High-Performance Computing: An Embarrassment of Riches?
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
The First Supercomputer with HyperX Topology: A Viable Alternative to Fat-Trees?
Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019
2018
Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope.
Comput. Graph. Forum, 2018
Mitigating inter-job interference using adaptive flow-aware routing.
Proceedings of the International Conference for High Performance Computing, 2018
2017
Routing on the Channel Dependency Graph:: A New Approach to Deadlock-Free, Destination-Based, High-Performance Routing for Lossless Interconnection Networks.
PhD thesis, 2017
Toward reliable validation of HPC network simulation models.
Proceedings of the 2017 Winter Simulation Conference, 2017
Preliminary Performance Analysis of Multi-rail Fat-tree Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
2016
A scalable framework for the global offline community land model ensemble simulation.
Int. J. Comput. Sci. Eng., 2016
Scheduling-aware routing for supercomputers.
Proceedings of the International Conference for High Performance Computing, 2016
Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
2015
Hardware-Centric Analysis of Network Performance for MPI Applications.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015
2014
Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.
Proceedings of the International Conference for High Performance Computing, 2014
Tracing Data Movements within MPI Collectives.
Proceedings of the 21st European MPI Users' Group Meeting, 2014
2012
Runtime Tracing of the Community Earth System Model: Feasibility Study and Benefits.
Proceedings of the International Conference on Computational Science, 2012
2011
Deadlock-Free Oblivious Routing for Arbitrary Topologies.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011