2025

Balanced and Elastic End-to-end Training of Dynamic LLMs.

[DOI]

,

Muhammed Abdullah Soyturk

,

CoRR, May, 2025

A Device-Side Execution Model for Multi-GPU Task Graphs.

[DOI]

Ilyas Turimbetov

,

,

Proceedings of the 39th ACM International Conference on Supercomputing, 2025

Uniconn: A Uniform High-Level Communication Library for Portable Multi-GPU Programming.

[DOI]

,

Sinan Ekmekçibasi

,

Khaled Z. Ibrahim

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2025

2024

The Landscape of GPU-Centric Communication.

[DOI]

,

Ilyas Turimbetov

,

Mohammad Kefah Taha Issa

,

,

,

Daniele De Sensi

,

Ismayil Ismayilov

CoRR, 2024

A Sparse Tensor Generator with Efficient Feature Extraction.

[DOI]

,

,

,

CoRR, 2024

Optimizing GNN-Based Multiple Object Tracking on a Graphcore IPU.

[DOI]

Mustafa Orkun Acar

,

,

Proceedings of the High Performance Computing. ISC High Performance 2024 International Workshops, 2024

P-MoVE: Performance Monitoring and Visualization with Encoded Knowledge.

[DOI]

,

,

José A. Morgado

,

Aleksandar Ilic

,

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Autonomous Execution for Multi-GPU Systems: Compiler Support.

[DOI]

Javid Baydamirli

,

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Snoopie: A Multi-GPU Communication Profiler and Visualizer.

[DOI]

Mohammad Kefah Taha Issa

,

Muhammad Aditya Sasongko

,

Ilyas Turimbetov

,

Javid Baydamirli

,

,

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

2023

Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison.

[DOI]

Muhammad Aditya Sasongko

,

,

Paul H. J. Kelly

,

IEEE Trans. Parallel Distributed Syst., May, 2023

Precise event sampling-based data locality tools for AMD multicore architectures.

[DOI]

Muhammad Aditya Sasongko

,

,

Paul H. J. Kelly

,

Concurr. Comput. Pract. Exp., 2023

Bringing Order to Sparsity: A Sparse Matrix Reordering Study on Multicore CPUs.

[DOI]

James D. Trotter

,

Sinan Ekmekçibasi

,

Johannes Langguth

,

,

,

Aleksandar Ilic

,

Proceedings of the International Conference for High Performance Computing, 2023

Multi-GPU Communication Schemes for Iterative Solvers: When CPUs are Not in Charge.

[DOI]

Ismayil Ismayilov

,

Javid Baydamirli

,

,

,

Proceedings of the 37th International Conference on Supercomputing, 2023

2022

ReuseTracker: Fast Yet Accurate Multicore Reuse Distance Analyzer.

[DOI]

Muhammad Aditya Sasongko

,

,

Mandana Bagheri-Marzijarani

,

ACM Trans. Archit. Code Optim., 2022

Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection.

[DOI]

,

,

,

,

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

2021

A Split Execution Model for SpTRSV.

[DOI]

,

,

IEEE Trans. Parallel Distributed Syst., 2021

A computational-graph partitioning method for training memory-constrained DNNs.

[DOI]

Fareed Qararyah

,

,

,

Mehmet Esat Belviranli

,

Parallel Comput., 2021

Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity.

[DOI]

,

,

Carsten Burstedde

,

Michael L. Norman

,

,

Comput. Sci. Eng., 2021

Monitoring Collective Communication Among GPUs.

[DOI]

Muhammet Abdullah Soytürk

,

Palwisha Akhtar

,

,

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Low-Overhead Reuse Distance Profiling Tool for Multicore.

[DOI]

Muhammad Aditya Sasongko

,

,

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

2020

TIGER: Topology-aware Assignment using Ising machines Application to Classical Algorithm Tasks and Quantum Circuit Gates.

[DOI]

Anastasiia Butko

,

Ilyas Turimbetov

,

George Michelogiannakis

,

,

,

CoRR, 2020

Adaptive Level Binning: A New Algorithm for Solving Sparse Triangular Systems.

[DOI]

,

Bugrra Sipahiogrlu

,

,

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Tiling-Based Programming Model for Structured Grids on GPU Clusters.

[DOI]

,

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

A Prediction Framework for Fast Sparse Triangular Solves.

[DOI]

,

,

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

ComScribe: Identifying Intra-node GPU Communication.

[DOI]

Palwisha Akhtar

,

,

Fareed Mohammad Qararyah

,

Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2019

Communication analysis and optimization of 3D front tracking method for multiphase flow simulations.

[DOI]

Muhammed Nufail Farooqi

,

Daulet Izbassarov

,

Metin Muradoglu

,

Int. J. High Perform. Comput. Appl., 2019

Asynchronous AMR on Multi-GPUs.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the High Performance Computing, 2019

ComDetective: a lightweight communication detection tool for threads.

[DOI]

Muhammad Aditya Sasongko

,

,

Palwisha Akhtar

,

Proceedings of the International Conference for High Performance Computing, 2019

Program analysis for process migration.

[DOI]

,

Ilyas Turimbetov

,

Proceedings of the 8th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis, 2019

2018

Load Balancing for Parallel Multiphase Flow Simulation.

[DOI]

,

Muhammed Nufail Farooqi

,

Sci. Program., 2018

Output nondeterminism detection for programming models combining dataflow with shared memory.

[DOI]

Hassan Salehe Matar

,

,

,

Parallel Comput., 2018

Special issue on High performance computing conference (BASARIM-2017).

[DOI]

,

Mehmet S. Aktas

Concurr. Comput. Pract. Exp., 2018

BindMe: A thread binding library with advanced mapping algorithms.

[DOI]

Pirah Noor Soomro

,

Muhammad Aditya Sasongko

,

Concurr. Comput. Pract. Exp., 2018

Fast multidimensional reduction and broadcast operations on GPU for machine learning.

[DOI]

,

Enis Berk Çoban

,

,

,

Concurr. Comput. Pract. Exp., 2018

Phase asynchronous AMR execution for productive and performant astrophysical flows.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2018

Phase-Based Data Placement Scheme for Heterogeneous Memory Systems.

[DOI]

Mohammad Shakeel Laghari

,

,

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Runtime Determinacy Race Detection for OpenMP Tasks.

[DOI]

Hassan Salehe Matar

,

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017

Trends in Data Locality Abstractions for HPC Systems.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Access pattern-aware data placement for hybrid DRAM/NVM.

[DOI]

Didem Unat Erten

Turkish J. Electr. Eng. Comput. Sci., 2017

Object Placement for High Bandwidth Memory Augmented with High Capacity Memory.

[DOI]

Mohammad Shakeel Laghari

,

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

EmbedSanitizer: Runtime Race Detection Tool for 32-bit Embedded ARM.

[DOI]

Hassan Salehe Matar

,

,

Proceedings of the Runtime Verification - 17th International Conference, 2017

Overlapping Data Transfers with Computation on GPU with Tiles.

[DOI]

,

,

,

,

Proceedings of the 46th International Conference on Parallel Processing, 2017

Nonintrusive AMR Asynchrony for Communication Optimization.

[DOI]

Muhammed Nufail Farooqi

,

,

,

,

,

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016

BoxLib with Tiling: An Adaptive Mesh Refinement Software Framework.

[DOI]

,

,

,

,

,

SIAM J. Sci. Comput., 2016

BoxLib with Tiling: An AMR Software Framework.

[DOI]

,

,

,

,

,

CoRR, 2016

TiDA: High-Level Programming Abstractions for Data Locality Management.

[DOI]

,

,

,

Muhammed Nufail Farooqi

,

,

George Michelogiannakis

,

,

Proceedings of the High Performance Computing - 31st International Conference, 2016

Perilla: metadata-based optimizations of an asynchronous runtime for adaptive mesh refinement.

[DOI]

,

,

,

,

Muhammed Nufail Farooqi

,

Proceedings of the International Conference for High Performance Computing, 2016

2015

ExaSAT: An exascale co-design tool for performance modeling.

[DOI]

,

,

,

Samuel Williams

,

,

,

Int. J. High Perform. Comput. Appl., 2015

2014

Abstract machine models and proxy architectures for exascale computing.

[DOI]

,

Richard F. Barrett

,

,

,

,

Jeanine E. Cook

,

,

Simon D. Hammond

,

Karl S. Hemmert

,

Suzanne M. Kelly

,

,

,

David R. Resnick

,

Arun F. Rodrigues

,

,

,

,

Nicholas J. Wright

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

2013

A new approach to interactive viewpoint selection for volume data sets.

[DOI]

,

,

,

Jürgen P. Schulze

Inf. Vis., 2013

Software Design Space Exploration for Exascale Combustion Co-design.

[DOI]

,

,

Michael Lijewski

,

,

,

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

2012

Domain-specific translator and optimizer for massive on- chip parallelism.

[DOI]

PhD thesis, 2012

Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset.

[DOI]

,

,

,

,

Proceedings of the International Conference on Computational Science, 2012

Accelerating a 3D Finite-Difference Earthquake Simulation with a C-to-CUDA Translator.

[DOI]

,

,

,

,

Comput. Sci. Eng., 2012

Interactive data-centric viewpoint selection.

[DOI]

,

,

,

Jürgen P. Schulze

Proceedings of the Visualization and Data Analysis 2012, 2012

Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators.

[DOI]

Mitesh R. Meswani

,

Laura Carrington

,

,

,

,

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011

Modeling and predicting application performance on hardware accelerators.

[DOI]

Mitesh R. Meswani

,

Laura Carrington

,

,

,

,

Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Mint: realizing CUDA performance in 3D stencil methods with annotated C.

[DOI]

,

,

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

2009

An Adaptive Sub-sampling Method for In-memory Compression of Scientific Data.

[DOI]

,

Theodore Hromadka III

,

Proceedings of the 2009 Data Compression Conference (DCC 2009), 2009