Ahmad Abdelfattah
Orcid: 0000-0001-5054-4784
According to our database1,
Ahmad Abdelfattah
authored at least 56 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications.
Future Gener. Comput. Syst., 2024
2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
2022
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.
Dataset, August, 2022
Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the IEEE/ACM International Workshop on Performance, 2022
Proceedings of the Computational Science - ICCS 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
ACM Trans. Math. Softw., 2021
J. Open Source Softw., 2021
Int. J. High Perform. Comput. Appl., 2021
Int. J. High Perform. Comput. Appl., 2021
2020
Matrix multiplication on batches of small matrices in half and half-complex precisions.
J. Parallel Distributed Comput., 2020
Int. J. High Perform. Comput. Appl., 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs.
Proceedings of the Computational Science - ICCS 2020, 2020
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
2019
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019
Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019
Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the 48th International Conference on Parallel Processing, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
2017
J. Comput. Sci., 2017
Proceedings of the General Purpose GPUs, 2017
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017
2016
KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
ACM Trans. Math. Softw., 2016
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurr. Comput. Pract. Exp., 2016
Acta Numer., 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016
Proceedings of the International Conference on Computational Science 2016, 2016
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
2015
Accelerating Scientific Applications using High Performance Dense and Sparse Linear Algebra Kernels on GPUs.
PhD thesis, 2015
Supercomput. Front. Innov., 2015
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
2014
Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.
Proceedings of the International Conference for High Performance Computing, 2014
High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014
2012
Proceedings of the High Performance Computing for Computational Science, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012