2025
Preparing MPICH for exascale.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Int. J. High Perform. Comput. Appl., 2025
2024
Integrating Interactive Performance Analysis in Jupyter Notebooks for Parallel Programming Education.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
HIPS 2024 Preface and Committees.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Simplifying non-contiguous data transfer with MPI for Python.
J. Supercomput., November, 2023
Special issue on new trends in high-performance computing: Software systems and applications.
Softw. Pract. Exp., 2023
2022
Improving cryptanalytic applications with stochastic runtimes on GPUs and multicores.
Parallel Comput., 2022
12th IEEE International Workshop on Accelerators and Hybrid Emerging Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Quality-aware scheduling of on-board and off-board data analysis in vehicle development.
Proceedings of the 5th IEEE International Conference on Industrial Cyber-Physical Systems, 2022
2021
Comparing Data Staging Techniques for Large Scale Brain Images.
IEEE Trans. Emerg. Top. Comput., 2021
Improving Cryptanalytic Applications with Stochastic Runtimes on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
2020
Fall-detection on a wearable micro controller using machine learning algorithms.
Proceedings of the IEEE International Conference on Smart Computing, 2020
Lessons learned from comparing C-CUDA and Python-Numba for GPU-Computing.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020
Workshop 8: AsHES Accelerators and Hybrid Exascale Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Implementation and Evaluation of CUDA-Unified Memory in Numba.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020
2019
IO Challenges for Human Brain Atlasing Using Deep Learning Methods - An In-Depth Analysis.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019
Evaluating the Benefits of Key-Value Databases for Scientific Applications.
Proceedings of the Computational Science - ICCS 2019, 2019
2017
InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU.
Int. J. High Perform. Comput. Appl., 2017
Why is MPI so slow?: analyzing the fundamental limits in implementing MPI-3.1.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2017
Hexe: A Toolkit for Heterogeneous Memory Management.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017
A Performance Study of UCX over InfiniBand.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
2016
Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy.
Parallel Comput., 2016
2015
Analyzing communication models for distributed thread-collaborative processors in terms of energy and time.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015
2014
Direct communication methods for distributed GPUs.
PhD thesis, 2014
Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication.
Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014
Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Analyzing Put/Get APIs for Thread-Collaborative Processors.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014
Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014
2013
GPI2 for GPUs: A PGAS framework for efficient communication in hybrid clusters.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013
GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
GASPI - A Partitioned Global Address Space Programming Interface.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Facing the Multicore-Challenge, 2012