Hatem Ltaief

Orcid: 0000-0002-6897-1095

Affiliations:
  • King Abdullah University of Science and Technology, Jeddah, Saudi Arabia


According to our database1, Hatem Ltaief authored at least 114 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
High performance computing seismic redatuming by inversion with algebraic compression and multiple precisions.
Int. J. High Perform. Comput. Appl., 2024

Portability and scalability evaluation of large-scale statistical modeling and prediction software through HPC-ready containers.
Future Gener. Comput. Syst., 2024

Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression.
CoRR, 2024

GPU-Accelerated Vecchia Approximations of Gaussian Processes for Geospatial Data using Batched Matrix Computations.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Leveraging the High Bandwidth of Last-Level Cache for HPC Seismic Imaging Applications.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2024

Parallel Approximations for High-Dimensional Multivariate Normal Probability Computation in Confidence Region Detection Applications.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

High Throughput Massive MIMO Signal Decoding Using Multi-Level Tree Search on FPGAs.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024

2023
Exploiting temporal data reuse and asynchrony in the reverse time migration.
Int. J. High Perform. Comput. Appl., March, 2023

Author Correction: The high-dimensional space of human diseases built from diagnosis records and mapped to genetic loci.
Nat. Comput. Sci., 2023

The high-dimensional space of human diseases built from diagnosis records and mapped to genetic loci.
Nat. Comput. Sci., 2023

Tile low-rank approximations of non-Gaussian space and space-time Tukey <i>g</i>-and-<i>h</i> random field likelihoods and predictions on large-scale systems.
J. Parallel Distributed Comput., 2023

Steering Customized AI Architectures for HPC Scientific Applications.
Proceedings of the High Performance Computing - 38th International Conference, 2023

GPU-Based Low-Precision Detection Approach for Massive MIMO Systems.
Proceedings of the High Performance Computing - 38th International Conference, 2023

Scaling the "Memory Wall" for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems.
Proceedings of the International Conference for High Performance Computing, 2023

High-Performance SVD Partial Spectrum Computation.
Proceedings of the International Conference for High Performance Computing, 2023

Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Efficient GPU-based Large MIMO Detection Algorithm for Next-Generation Communication Systems.
Proceedings of the IEEE Global Communications Conference, 2023

Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC.
IEEE Trans. Parallel Distributed Syst., 2022

High-performance 3D Unstructured Mesh Deformation Using Rank Structured Matrix Computations.
ACM Trans. Parallel Comput., 2022

Responsibly Reckless Matrix Algorithms for HPC Scientific Applications.
Comput. Sci. Eng., 2022

Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Parallel space-time likelihood optimization for air pollution prediction on large-scale systems.
Proceedings of the PASC '22: Platform for Advanced Scientific Computing Conference, Basel, Switzerland, June 27, 2022

Parallel Approximations of the Tukey g-and-h Likelihoods and Predictions for Non-Gaussian Geostatistics.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

A Framework to Exploit Data Sparsity in Tile Low-Rank Cholesky Factorization.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

High-Performance Spatial Data Compression for Scientific Applications.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

2021
High Performance Multivariate Geospatial Statistics on Manycore Systems.
IEEE Trans. Parallel Distributed Syst., 2021

Accelerating Seismic Redatuming Using Tile Low-Rank Approximations on NEC SX-Aurora TSUBASA.
Supercomput. Front. Innov., 2021

High-Performance Partial Spectrum Computation for Symmetric eigenvalue problems and the SVD.
CoRR, 2021

Meeting the real-time challenges of ground-based telescopes using low-rank matrix computations.
Proceedings of the International Conference for High Performance Computing, 2021

Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Outsmarting the Atmospheric Turbulence for Ground-Based Telescopes Using the Stochastic Levenberg-Marquardt Method.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020
Abstraction Layer For Standardizing APIs of Task-Based Engines.
IEEE Trans. Parallel Distributed Syst., 2020

Asynchronous computations for solving the acoustic wave propagation equation.
Int. J. High Perform. Comput. Appl., 2020

High Performance Multivariate Spatial Modeling for Geostatistical Data on Manycore Systems.
CoRR, 2020

Performance / Complexity Trade-offs of the Sphere Decoder Algorithm for Massive MIMO Systems.
CoRR, 2020

Solving Acoustic Boundary Integral Equations Using High Performance Tile Low-Rank LU Factorization.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications.
Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

Maximizing I/O Bandwidth for Reverse Time Migration on Heterogeneous Large-Scale Systems.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019
Massively Parallel Polar Decomposition on Distributed-memory Systems.
ACM Trans. Parallel Comput., 2019

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems.
ACM Trans. Math. Softw., 2019

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs.
ACM Trans. Math. Softw., 2019

ExaGeoStatR: A Package for Large-Scale Geostatistics in R.
CoRR, 2019

Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators.
Proceedings of the 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2019

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

MLBS: Transparent Data Caching in Hierarchical Storage for Out-of-Core HPC Applications.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Geostatistical Modeling and Prediction Using Mixed Precision Tile Cholesky Factorization.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Asynchronous Task-Based Execution of the Reverse Time Migration for the Oil and Gas Industry.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
Asynchronous Task-Based Polar Decomposition on Single Node Manycore Architectures.
IEEE Trans. Parallel Distributed Syst., 2018

ExaGeoStat: A High Performance Unified Software for Geostatistics on Manycore Systems.
IEEE Trans. Parallel Distributed Syst., 2018

Multidimensional Intratile Parallelization for Memory-Starved Stencil Computations.
ACM Trans. Parallel Comput., 2018

Accelerated Cyclic Reduction: A distributed-memory fast solver for structured linear systems.
Parallel Comput., 2018

Batched QR and SVD algorithms on GPUs with applications in hierarchical matrix compression.
Parallel Comput., 2018

Tile Low-Rank Approximation of Large-Scale Maximum Likelihood Estimation on Manycore Architectures.
CoRR, 2018

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2018

Real-Time Massively Distributed Multi-object Adaptive Optics Simulations for the European Extremely Large Telescope.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Tile Low-Rank GEMM Using Batched Operations on GPUs.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Exploiting Data Sparsity for Large-Scale Matrix Computations.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Trends in Data Locality Abstractions for HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2017

ExaGeoStat: A High Performance Unified Framework for Geostatistics on Manycore Systems.
CoRR, 2017

A framework for dense triangular matrix kernels on various manycore architectures.
Concurr. Comput. Pract. Exp., 2017

Tile Low Rank Cholesky Factorization for Climate/Weather Modeling Applications on Manycore Architectures.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

2016
A High Performance QDWH-SVD Solver Using Hardware Accelerators.
ACM Trans. Math. Softw., 2016

KBLAS: An Optimized Library for Dense Matrix-Vector Multiplication on GPU Accelerators.
ACM Trans. Math. Softw., 2016

Accelerated Dimension-Independent Adaptive Metropolis.
SIAM J. Sci. Comput., 2016

Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurr. Comput. Pract. Exp., 2016

Adaptive Optics Simulation for the World's Largest Telescope on Multicore Architectures with Multiple GPUs.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Efficient Sphere Detector Algorithm for Massive MIMO using GPU Hardware Accelerator.
Proceedings of the International Conference on Computational Science 2016, 2016

High Performance Polar Decomposition on Distributed Memory Systems.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Redesigning Triangular Dense Matrix Computations on GPUs.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Dense Matrix Computations on NUMA Architectures with Distance-Aware Work Stealing.
Supercomput. Front. Innov., 2015

Multicore-Optimized Wavefront Diamond Blocking for Optimizing Stencil Updates.
SIAM J. Sci. Comput., 2015

Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015

High Performance Multi-GPU SpMV for Multi-component PDE-Based Applications.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014
Power profiling of Cholesky and QR factorizations on distributed memory systems.
Comput. Sci. Res. Dev., 2014

Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014

Data-driven execution of fast multipole methods.
Concurr. Comput. Pract. Exp., 2014

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.
Concurr. Comput. Pract. Exp., 2014

Pipelining Computational Stages of the Tomographic Reconstructor for Multi-Object Adaptive Optics on a Multi-GPU System.
Proceedings of the International Conference for High Performance Computing, 2014

High Performance Pseudo-analytical Simulation of Multi-Object Adaptive Optics over Multi-GPU Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.
ACM Trans. Math. Softw., 2013

2012
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.
SIAM J. Sci. Comput., 2012

Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.
Comput. Sci. Res. Dev., 2012

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators.
Proceedings of the High Performance Computing for Computational Science, 2012



A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Systematic Approach in Optimizing Numerical Memory-Bound Kernels on GPU.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012

2011
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.
Concurr. Comput. Pract. Exp., 2011

Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.
Proceedings of the Conference on High Performance Computing Networking, 2011

High performance matrix inversion based on LU factorization for multicore architectures.
Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011

Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Exploiting Fine-Grain Parallelism in Recursive LU Factorization.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

LU factorization for accelerator-based systems.
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011

2010
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2010

Scheduling two-sided transformations using tile algorithms on multicore architectures.
Sci. Program., 2010

Scheduling dense linear algebra operations on multicore processors.
Concurr. Comput. Pract. Exp., 2010

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems.
Proceedings of the Conference on High Performance Computing Networking, 2010

Dense linear algebra solvers for multicore with GPU accelerators.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Tile QR factorization with parallel panel processing for multicore architectures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

2009
A parallel Aitken-additive Schwarz waveform relaxation suitable for the grid.
Parallel Comput., 2009

Comparative study of one-sided factorizations with multiple software packages on multi-core hardware.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2008
Fault tolerant algorithms for heat transfer problems.
J. Parallel Distributed Comput., 2008

Scheduling for Numerical Linear Algebra Library at Scale.
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

2006
Parallel Fault Tolerant Algorithms for Parabolic Problems.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006


  Loading...