Ahmad Abdelfattah

Orcid: 0000-0001-5054-4784

According to our database1, Ahmad Abdelfattah authored at least 56 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications.
Future Gener. Comput. Syst., 2024

2023

GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

PAQR: Pivoting Avoiding QR factorization.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022

Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.
Dataset, August, 2022

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Portable and Efficient Dense Linear Algebra in the Beginning of the Exascale Era.
Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Batch QR Factorization on GPUs: Design, Optimization, and Tuning.
Proceedings of the Computational Science - ICCS 2022, 2022

GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021



A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines.
ACM Trans. Math. Softw., 2021

GPU algorithms for Efficient Exascale Discretizations.
Parallel Comput., 2021

libCEED: Fast algebra for high-order element-based discretizations.
J. Open Source Softw., 2021

Efficient exascale discretizations: High-order finite element methods.
Int. J. High Perform. Comput. Appl., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.
Int. J. High Perform. Comput. Appl., 2021

2020
Matrix multiplication on batches of small matrices in half and half-complex precisions.
J. Parallel Distributed Comput., 2020

MAGMA templates for scalable linear algebra on emerging architectures.
Int. J. High Perform. Comput. Appl., 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
CoRR, 2020

High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs.
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs.
Proceedings of the Computational Science - ICCS 2020, 2020

Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

2019
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019

Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019

Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Massively Parallel Automated Software Tuning.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Progressive Optimization of Batched LU Factorization on GPUs.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018

Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018

Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018

Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018

Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

2017
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA.
J. Comput. Sci., 2017

With Extreme Computing, the Rules Have Changed.
Comput. Sci. Eng., 2017

High-performance Cholesky factorization for GPU-only execution.
Proceedings of the General Purpose GPUs, 2017

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017

2016
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
ACM Trans. Math. Softw., 2016

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurr. Comput. Pract. Exp., 2016

Linear algebra software for large-scale accelerated multicore computing.
Acta Numer., 2016

Performance, Design, and Autotuning of Batched GEMM for GPUs.
Proceedings of the High Performance Computing - 31st International Conference, 2016

On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016

High-Performance Tensor Contractions for GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016

High-Performance Matrix-Matrix Multiplications of Very Small Matrices.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs.
PhD thesis, 2015

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.
Supercomput. Front. Innov., 2015

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014
Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.
Proceedings of the International Conference for High Performance Computing, 2014

High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2012
Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators.
Proceedings of the High Performance Computing for Computational Science, 2012

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012


  Loading...