Azzam Haidar
Orcid: 0000-0002-3177-2084
According to our database1,
Azzam Haidar
authored at least 79 papers
between 2008 and 2023.
Collaborative distances:
Collaborative distances:
Timeline
2008
2010
2012
2014
2016
2018
2020
2022
0
5
10
15
1
1
1
4
5
2
1
2
2
1
2
1
1
1
1
1
1
1
1
3
8
9
11
7
4
3
3
1
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2023
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023
2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
2021
ACM Trans. Math. Softw., 2021
Proceedings of the Workshop on Exascale MPI, 2021
2020
Int. J. High Perform. Comput. Appl., 2020
Proceedings of the Computational Science - ICCS 2020, 2020
2019
ACM Trans. Math. Softw., 2019
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019
Int. J. High Perform. Comput. Netw., 2019
Concurr. Comput. Pract. Exp., 2019
2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018
J. Comput. Sci., 2018
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.
Proceedings of the International Conference for High Performance Computing, 2018
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
2017
J. Comput. Sci., 2017
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the General Purpose GPUs, 2017
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017
Proceedings of the International Conference on Computational Science, 2017
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
2016
Acta Numer., 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016
Proceedings of the International Conference on Computational Science 2016, 2016
LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
2015
Supercomput. Front. Innov., 2015
Sci. Program., 2015
On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors.
Proceedings of the High Performance Computing - 30th International Conference, 2015
A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015
Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures.
Proceedings of the Symposium on High Performance Computing, 2015
Efficient implementation of quantum materials simulations on distributed CPU-GPU systems.
Proceedings of the International Conference for High Performance Computing, 2015
Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform.
Proceedings of the International Conference on Computational Science, 2015
MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
2014
Supercomput. Front. Innov., 2014
A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks.
Int. J. High Perform. Comput. Appl., 2014
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
Parallel algebraic domain decomposition solver for the solution of augmented systems.
Adv. Eng. Softw., 2013
Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
An improved parallel singular value algorithm and its implementation for multicore hardware.
Proceedings of the International Conference for High Performance Computing, 2013
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.
Proceedings of the International Conference on Supercomputing, 2013
2012
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.
SIAM J. Sci. Comput., 2012
Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
2011
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.
Concurr. Comput. Pract. Exp., 2011
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.
Proceedings of the Conference on High Performance Computing Networking, 2011
Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
2010
Using multiple levels of parallelism to enhance the performance of domain decomposition solvers.
Parallel Comput., 2010
2009
Numer. Algorithms, 2009
2008
On the parallel scalability of hybrid linear solvers for large 3D problems. (Sur l'extensibilité parallèle de solveurs linéaires hybrides pour des problèmes tridimensionels de grandes tailles).
PhD thesis, 2008
Parallel Comput., 2008