Pablo Ezzatti

Orcid: 0000-0002-2368-8907

According to our database1, Pablo Ezzatti authored at least 107 papers between 2009 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Tuning high-level synthesis SpMV kernels in Alveo FPGAs.
Microprocess. Microsystems, 2024

Leveraging index compression techniques to optimize the use of co-processors.
J. Comput. Sci. Technol., 2024

A new level-set analysis and sparse storage format for the SPTRSV in GPUs.
Proceedings of the 36th IEEE International Symposium on Computer Architecture and High Performance Computing, 2024

Avoiding Training in the Platform-Aware Optimization Process for Faster DNN Latency Reduction.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

A GPU method for the analysis stage of the SPTRSV kernel.
J. Supercomput., September, 2023

Advancing on an efficient sparse matrix multiplication kernel for modern GPUs.
Concurr. Comput. Pract. Exp., 2023

Assessing the Performance of an Architecture-Aware Optimization Tool for Neural Networks.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks.
Proceedings of the 31st Euromicro International Conference on Parallel, 2023

Trajectory-based Metaheuristics for Improving Sparse Matrix Storage.
Proceedings of the IEEE Latin American Conference on Computational Intelligence, 2023

Towards Reducing Communications in Sparse Matrix Kernels.
Proceedings of the Cloud Computing, Big Data & Emerging Topics - 11th Conference, 2023

Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs.
Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

Enhancing the Sparse Matrix Storage Using Reordering Techniques.
Proceedings of the High Performance Computing - 10th Latin American Conference, 2023

Refactoring an Electric-Market Simulation Software for Massively Parallel Computations.
Proceedings of the High Performance Computing - 9th Latin American Conference, 2022

Time-Power-Energy Balance of blas Kernels in Modern fpgas.
Proceedings of the High Performance Computing - 9th Latin American Conference, 2022

Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators.
J. Supercomput., 2021

Machine learning for optimal selection of sparse triangular system solvers on GPUs.
J. Parallel Distributed Comput., 2021

Energy-efficient algebra kernels in FPGA for High Performance Computing.
J. Comput. Sci. Technol., 2021

Selecting optimal SpMV realizations for GPUs via machine learning.
Int. J. High Perform. Comput. Appl., 2021

Unleashing the computational power of FPGAs to efficiently perform SPMV operation.
Proceedings of the 40th International Conference of the Chilean Computer Science Society, 2021

Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs.
Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021

Optimizing Sparse Matrix Storage for the Big Data Era.
Proceedings of the Cloud Computing, Big Data & Emerging Topics - 9th Conference, 2021

Towards an Efficient Sparse Storage Format for the SpMM Kernel in GPUs.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Assessing the solution of one sparse triangular linear system on multi-many core platforms.
Proceedings of the XLVII Latin American Computing Conference, 2021

Proximity tracing applications for COVID-19: data privacy and security.
Proceedings of the XLVII Latin American Computing Conference, 2021

Improving the performance of graph database queries using linear algebra operations.
Proceedings of the XLVII Latin American Computing Conference, 2021

Using analysis information in the synchronization-free GPU solution of sparse triangular systems.
Concurr. Comput. Pract. Exp., 2020

An asynchronous computation architecture for enhancing the performance of the Weather Research and Forecasting model.
Concurr. Comput. Pract. Exp., 2020

Estimating the parallelism in the solution of sparse triangular linear systems.
Proceedings of the 39th International Conference of the Chilean Computer Science Society, 2020

Understanding the Performance of Elementary NLA Kernels in FPGAs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Exploring fpga Optimizations to Compute Sparse Numerical Linear Algebra Kernels.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2020

An efficient GPU version of the preconditioned GMRES method.
J. Supercomput., 2019

Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations.
Parallel Comput., 2019

Power-aware computing.
Concurr. Comput. Pract. Exp., 2019

A GPU-aware mixed-precision solver for low-rank algebraic Riccati equations.
Concurr. Comput. Pract. Exp., 2019

Avoiding Synchronization to Accelerate a CFD Solver in GPU.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Automatic Selection of Sparse Triangular Linear System Solvers on GPUs through Machine Learning Techniques.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Towards a Lightweight Method to Predict the Performance of Sparse Triangular Solvers on Heterogeneous Hardware Platforms.
Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

Accelerating the Calculation of Friedman Test Tables on Many-Core Processors.
Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Task Parallelism in the WRF Model Through Computation Offloading to Many-Core Devices.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Extending ILUPACK with a GPU Version of the BiCGStab Method.
Proceedings of the XLIV Latin American Computer Conference, 2018

A comparison of various schemes for solving the transport equation in many-core platforms.
J. Supercomput., 2017

Extending the Gauss-Huard method for the solution of Lyapunov matrix equations and matrix inversion.
Concurr. Comput. Pract. Exp., 2017

Assessing Sparse Triangular Linear System Solvers on GPUs.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Solving Sparse Differential Riccati Equations on Hybrid CPU-GPU Platforms.
Proceedings of the Computational Science and Its Applications - ICCSA 2017, 2017

A VNS with Parallel Evaluation of Solutions for the Inverse Lighting Problem.
Proceedings of the Applications of Evolutionary Computation - 20th European Conference, 2017

Evaluating the NVIDIA Tegra Processor as a Low-Power Alternative for Sparse GPU Computations.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators.
Parallel Comput., 2016

Energy-aware solution of linear systems with many right hand sides.
Comput. Sci. Res. Dev., 2016

Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems.
Comput. Sci. Res. Dev., 2016

Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms.
CLEI Electron. J., 2016

Accelerating the quality measurement of DNA with GPUs.
Proceedings of the 35th International Conference of the Chilean Computer Science Society, 2016

Accelerating an IEEE 802.11 a/g/p Transceiver in GNU Radio.
Proceedings of the 9th Latin America Networking Conference, 2016

Unleashing the Graphic Processing Units-Based Version of NAMD.
Proceedings of the Bioinformatics and Biomedical Engineering, 2016

Accelerating the resolution of generalized Lyapunov matrix equations on hybrid architectures.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016

The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems.
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Overview of HPC benchmarks in hybrid hardware platforms (CPUs+GPUs).
Proceedings of the XLII Latin American Computing Conference, 2016

Assessing the explicit finite difference method on a massive parallel platform.
Proceedings of the XLII Latin American Computing Conference, 2016

Taking advantage of HPC techniques in the operational forecast of the Río de la Plata.
Proceedings of the XLII Latin American Computing Conference, 2016

Design of a Task-Parallel Version of ILUPACK for Graphics Processors.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

Extending lyapack for the solution of band Lyapunov equations on hybrid CPU-GPU platforms.
J. Supercomput., 2015

Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction.
Clust. Comput., 2015

Painless Parallelism on Heterogeneous Hardware Leveraging the Functional Paradigm.
Proceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshops, 2015

Accelerating the Min-Min Heuristic.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

A Parallel Multi-threaded Solver for Symmetric Positive Definite Bordered-Band Linear Systems.
Proceedings of the Parallel Processing and Applied Mathematics, 2015

Exploring the Offload Execution Model in the Intel Xeon Phi via Matrix Inversion.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Solving dense linear systems with hybrid ARM+GPU platforms.
Proceedings of the 2015 Latin American Computing Conference, 2015

Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm.
Proceedings of the High Performance Computing - Second Latin American Conference, 2015

Another step to the full GPU implementation of the weather research and forecasting model.
J. Supercomput., 2014

A factored variant of the Newton iteration for the solution of algebraic Riccati equations via the matrix sign function.
Numer. Algorithms, 2014

Trading Off Performance for Energy in Linear Algebra Operations with Applications in Control Theory.
CLEI Electron. J., 2014

Leveraging Data-Parallelism in ILUPACK using Graphics Processors.
Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing, 2014

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction.
Proceedings of the Computational Science and Its Applications - ICCSA 2014 - 14th International Conference, Guimarães, Portugal, June 30, 2014

Accelerating the general band matrix multiplication using graphics processors.
Proceedings of the XL Latin American Computing Conference, 2014

Efficient Symmetric Band Matrix-Matrix Multiplication on GPUs.
Proceedings of the High Performance Computing - First HPCLATAM, 2014

Accelerating the Lyapack library using GPUs.
J. Supercomput., 2013

An efficient implementation of the Min-Min heuristic.
Comput. Oper. Res., 2013

Matrix inversion on CPU-GPU platforms with applications in control theory.
Concurr. Comput. Pract. Exp., 2013

Solving Matrix Equations on Multi-Core and Many-Core Architectures.
Algorithms, 2013

Exploiting Data- and Task-Parallelism in the Solution of Riccati Equations on Multicore Servers and GPUs.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Towards a functional run-time for dense NLA domain.
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, 2013

On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Towards a finite volume model on a many-core platform.
Int. J. High Perform. Syst. Archit., 2012

High Performance Implementations of the BST Method on Hybrid CPU-GPU Platforms.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

GPU Acceleration of the caffa3d.MB Model.
Proceedings of the Computational Science and Its Applications - ICCSA 2012, 2012

Low-rank Radiosity using Sparse Matrices.
Proceedings of the GRAPP & IVAPP 2012: Proceedings of the International Conference on Computer Graphics Theory and Applications and International Conference on Information Visualization Theory and Applications, 2012

Unleashing CPU-GPU Acceleration for Control Theory Applications.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Accelerating radiative heat transfer calculations on modern hardware.
Proceedings of the 2012 XXXVIII Conferencia Latinoamericana En Informatica (CLEI), 2012

Using graphics processors to accelerate the computation of the matrix inverse.
J. Supercomput., 2011

A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU-GPU platforms.
Parallel Comput., 2011

An efficient version of the RMA-11 model.
CLEI Electron. J., 2011

A GPU Implementation of the SIP Method.
Proceedings of the 30th International Conference of the Chilean Computer Science Society, 2011

A Study on the Implementation of Tridiagonal Systems Solvers Using a GPU.
Proceedings of the 30th International Conference of the Chilean Computer Science Society, 2011

Accelerating BST Methods for Model Reduction with Graphics Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2011

High Performance Matrix Inversion on a Multi-core Platform with Several GPUs.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

High performance matrix inversion of SPD matrices on graphics processors.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Efficient Model Order Reduction of Large-Scale Systems on Multi-core Platforms.
Proceedings of the Computational Science and Its Applications - ICCSA 2011, 2011

Improving the Performance of a Ray Tracing Algorithm Using a GPU.
Proceedings of the SCCC 2010, 2010

Accelerating Model Reduction of Large Linear Systems with Graphics Processors.
Proceedings of the Applied Parallel and Scientific Computing, 2010

PUGACE, a cellular Evolutionary Algorithm framework on GPUs.
Proceedings of the IEEE Congress on Evolutionary Computation, 2010

Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function.
Proceedings of the Euro-Par 2009, 2009
