Pedro Valero-Lara

Keita Teranishi

Proceedings of the Asynchronous Many-Task Systems and Applications, 2024

IRIS Reimagined: Advancements in Intelligent Runtime System for Task-Based Programming.

[BibT_eX]

[DOI]

Seyong Lee

Beau Johnston

Aaron R. Young

Proceedings of the Asynchronous Many-Task Systems and Applications, 2024

ChatBLAS: The First AI-Generated and Portable BLAS Library.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

JACC: Leveraging HPC Meta-Programming and Performance Portability with the Just-in-Time and LLVM-based Julia Language.

[BibT_eX]

[DOI]

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Integrating ORNL's HPC and Neutron Facilities with a Performance-Portable CPU/GPU Ecosystem.

[BibT_eX]

[DOI]

Christina M. Hoffmann

Rafael Ferreira da Silva

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

9th IEEE International Workshop on Automatic Performance Tuning (iWAPT 2024).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

eCC++ : A Compiler Construction Framework for Embedded Domain-Specific Languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC.

[BibT_eX]

[DOI]

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

2023

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation.

[BibT_eX]

[DOI]

CoRR, 2023

Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime.

[BibT_eX]

[DOI]

Keita Teranishi

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system.

[BibT_eX]

[DOI]

Rafael Ferreira da Silva

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms.

[BibT_eX]

[DOI]

Frank Liu

Proceedings of the 2nd International Workshop on Extreme Heterogeneity Solutions, 2023

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos.

[BibT_eX]

[DOI]

Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, 2023

(AsHES) 2023 Keynote Speaker Agnostic Programing: "Less is More".

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes.

[BibT_eX]

[DOI]

Valentin Churavy

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation.

[BibT_eX]

[DOI]

Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

IRIS-DMEM: Efficient Memory Management for Heterogeneous Computing.

[BibT_eX]

[DOI]

Frank Y. Liu

Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

2022

Propagation Pattern for Moment Representation of the Lattice Boltzmann Method.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2022

cuConv: CUDA implementation of convolution for CNN inference.

[BibT_eX]

[DOI]

Marc Jordà

Antonio J. Peña

Clust. Comput., 2022

KokkACC: Enhancing Kokkos with OpenACC.

[BibT_eX]

[DOI]

Seyong Lee

Joel E. Denny

Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022

LaRIS: Targeting Portability and Productivity for LAPACK Codes on Extreme Heterogeneous Systems by Using IRIS.

[BibT_eX]

[DOI]

Frank Y. Liu

Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SparseLU, A Novel Algorithm and Math Library for Sparse LU Factorization.

[BibT_eX]

[DOI]

Cameron Greenwalt

Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library.

[BibT_eX]

[DOI]

Frank Liu

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

A Portable and Heterogeneous LU Factorization on IRIS.

[BibT_eX]

[DOI]

Jungwon Kim

Proceedings of the Euro-Par 2022: Parallel Processing Workshops, 2022

2021

Static Graphs for Coding Productivity in OpenACC.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

2020

sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library).

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2020

Towards an Auto-Tuned and Task-Based SpMV (LASs Library).

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019

MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain.

[BibT_eX]

[DOI]

Parallel Comput., 2019

A Fast Solver for Large Tridiagonal Systems on Multi-Core Processors (Lass Library).

[BibT_eX]

[DOI]

IEEE Access, 2019

Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs.

[BibT_eX]

[DOI]

Marc Jordà

Leonel Antonio Toledo Díaz

Antonio J. Peña

IEEE Access, 2019

BLAS-3 Optimized by OmpSs Regions (LASs Library).

[BibT_eX]

[DOI]

Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Tasking in Accelerators: Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

Accelerating Conjugate Gradient using OmpSs.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

2018

cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2018

MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain: Human Brain Project.

[BibT_eX]

[DOI]

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Variable Batched DGEMM.

[BibT_eX]

[DOI]

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

2017

Introduction to the Special Issue on High Performance Computing Solutions for Complex Problems.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2017

Towards HPC-Embedded. Case Study: Kalray and Message-Passing on NoC.

[BibT_eX]

[DOI]

Ezhilmathi Krishnasamy

Scalable Comput. Pract. Exp., 2017

Heterogeneous CPU+GPU approaches for mesh refinement over Lattice-Boltzmann simulations.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

Reducing memory requirements for large size LBM simulations on GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2017

Heuristics for ROSA's LTS Searching.

[BibT_eX]

[DOI]

Fernando López Pelayo

Fernando Cuartero Gómez

Diego Cazorla

Mercedes G. Merayo

Proceedings of the Advances in Computational Intelligence, 2017

cuHinesBatch: Solving Multiple Hines systems on GPUs Human Brain Project<sup>*</sup>.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2017

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2017

2016

Introduction to the Special Issue on High Performance Computing Solutions for Complex Problems.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2016

Many-Task Computing on Many-Core Architectures.

[BibT_eX]

[DOI]

Serapheim Dimitropoulos

Ioan Raicu

Scalable Comput. Pract. Exp., 2016

Leveraging the Performance of LBM-HPC for Large Sizes on GPUs Using Ghost Cells.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

2015

Accelerating fluid-solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures.

[BibT_eX]

[DOI]

J. Comput. Sci., 2015

A Non-uniform Staggered Cartesian Grid Approach for Lattice-boltzmann Method.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2015

Multi-domain Grid Refinement for Lattice-Boltzmann Simulations on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on Computational Science and Engineering, 2015

LBM-HPC - An Open-Source Tool for Fluid Simulations. Case Study: Unified Parallel C (UPC-PGAS).

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Accelerating solid-fluid interaction based on the immersed boundary method on multicore and GPU architectures.

[BibT_eX]

[DOI]

J. Supercomput., 2014

Fast finite difference Poisson solvers on heterogeneous architectures.

[BibT_eX]

[DOI]

Alfredo Pinelli

Manuel Prieto-Matías

Comput. Phys. Commun., 2014

hLCS. A Hybrid GPGPU Approach for Solving Multiple Short and Unbalanced LCS Problems.

[BibT_eX]

[DOI]

Proceedings of the Computational Science and Its Applications - ICCSA 2014 - 14th International Conference, Guimarães, Portugal, June 30, 2014

Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Alfredo Pinelli

Manuel Prieto-Matías

Proceedings of the International Conference on Computational Science, 2014

Multi-GPU acceleration of DARTEL (early detection of Alzheimer).

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

A GPU approach for accelerating 3D deformable registration (DARTEL) on brain biomedical images.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

GPU Powered ROSA Analyzer.

[BibT_eX]

[DOI]

Raúl Pardo

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Analysis in performance and new model for multiple kernels executions on many-core architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE 12th International Conference on Cognitive Informatics and Cognitive Computing, 2013

2012

Block Tridiagonal Solvers on Heterogeneous Architectures.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

MRF Satellite Image Classification on GPU.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Improving the Performance for the Range Search on Metric Spaces Using a Multi-GPU Platform.

[BibT_eX]

[DOI]

Roberto Uribe Paredes

Proceedings of the Database and Expert Systems Applications, 2012

2011

A GPU-based implementation of the MRF algorithm in ITK package.

[BibT_eX]

[DOI]

J. Supercomput., 2011

Similarity search implementations for multi-core and many-core processors.

[BibT_eX]

[DOI]

Roberto Uribe Paredes

Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Towards a More Efficient Use of GPUs.

[BibT_eX]

[DOI]