Proteus: Portable Runtime Optimization of GPU Kernel Execution with Just-in-Time Compilation.
Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025
An Exploration of Global Optimization Strategies for Autotuning OpenMP-based Codes.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Machine Learning-Driven Adaptive OpenMP For Portable Performance on Heterogeneous Systems.
CoRR, 2023
Extending OpenMP for Machine Learning-Driven Adaptation.
Proceedings of the Accelerator Programming Using Directives - 8th International Workshop, 2021
Artemis: Automatic Runtime Tuning of Parallel Execution Parameters Using Machine Learning.
Proceedings of the High Performance Computing - 36th International Conference, 2021
Umpire: Application-focused management and coordination of complex hierarchical memory.
IBM J. Res. Dev., 2020
CodeSeer: input-dependent code variants selection via machine learning.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
RAJA: Portable Performance for Large-Scale Scientific Applications.
Proceedings of the 2019 IEEE/ACM International Workshop on Performance, 2019
Performance portable C++ programming with RAJA.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
FuncyTuner: Auto-tuning Scientific Applications With Per-loop Compilation.
Proceedings of the 48th International Conference on Parallel Processing, 2019
Int. J. High Perform. Comput. Appl., 2018
Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
TeaLeaf: A Mini-Application to Enable Design-Space Explorations for Iterative Sparse Linear Solvers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Flexible Data Aggregation for Performance Profiling.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Caliper: performance introspection for HPC software stacks.
Proceedings of the International Conference for High Performance Computing, 2016
Fast Multi-parameter Performance Modeling.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Towards scalable adaptive mesh refinement on future parallel architectures.
PhD thesis, 2015
Resident Block-Structured Adaptive Mesh Refinement on Thousands of Graphics Processing Units.
Proceedings of the 44th International Conference on Parallel Processing, 2015
Achieving portability and performance through OpenACC.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014
Towards Automated Memory Model Generation Via Event Tracing.
Comput. J., 2013
Analysing the influence of InfiniBand choice on OpenMPI memory consumption.
Proceedings of the International Conference on High Performance Computing & Simulation, 2013
Accelerating Hydrocodes with OpenACC, OpeCL and CUDA.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Performance Modelling of Magnetohydrodynamics Codes.
Proceedings of the Computer Performance Engineering - 9th European Workshop, 2012
Optimisation of Patch Distribution Strategies for AMR Applications.
Proceedings of the Computer Performance Engineering - 9th European Workshop, 2012