Rahulkumar Gayatri

According to our database1, Rahulkumar Gayatri authored at least 20 papers between 2012 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


On csauthors.net:


Asynchronous-Many-Task Systems: Challenges and Opportunities - Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX.
CoRR, 2024

Scaling and performance portability of the particle-in-cell scheme for plasma physics applications through mini-apps targeting exascale architectures.
Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing, 2024

Leveraging LLVM OpenMP GPU Offload Optimizations for Kokkos Applications.
Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024

The Kokkos OpenMPTarget Backend: Implementation and Lessons Learned.
Proceedings of the OpenMP: Advanced Task-Based, Device and Compiler Programming, 2023

Kokkos 3: Programming Model Extensions for the Exascale Era.
IEEE Trans. Parallel Distributed Syst., 2022

ALPINE: A set of performance portable plasma physics particle-in-cell mini-apps for exascale computing.
CoRR, 2022

A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Billion atom molecular dynamics simulations of carbon at extreme conditions and experimental time and length scales.
Proceedings of the International Conference for High Performance Computing, 2021

Non-recurring engineering (NRE) best practices: a case study with the NERSC/NVIDIA OpenMP contract.
Proceedings of the International Conference for High Performance Computing, 2021

Case Study of Using Kokkos and SYCL as Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs.
Proceedings of the International Workshop on Performance, 2021

Rapid Exploration of Optimization Strategies on Advanced Architectures using TestSNAP and LAMMPS.
CoRR, 2020

Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems.
Concurr. Comput. Pract. Exp., 2020

Timemory: Modular Performance Analysis for HPC.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Evaluating Performance Portability of OpenMP for SNAP on NVIDIA, Intel, and AMD GPUs Using the Roofline Methodology.
Proceedings of the Accelerator Programming Using Directives - 7th International Workshop, 2020

Comparing Managed Memory and ATS with and without Prefetching on NVIDIA Volta GPUs.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

A Novel Multi-level Integrated Roofline Model Approach for Performance Characterization.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

A Case Study for Performance Portability Using OpenMP 4.5.
Proceedings of the Accelerator Programming Using Directives - 5th International Workshop, 2018

TERAFLUX: Harnessing dataflow in next generation teradevices.
Microprocess. Microsystems, 2014

Loop level speculation in a task based programming model.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Transactional Access to Shared Memory in StarSs, a Task Based Programming Model.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
