Ernesto Dufrechu

Proceedings of the 36th IEEE International Symposium on Computer Architecture and High Performance Computing, 2024

Avoiding Training in the Platform-Aware Optimization Process for Faster DNN Latency Reduction.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

A GPU method for the analysis stage of the SPTRSV kernel.

[BibT_eX]

[DOI]

J. Supercomput., September, 2023

Advancing on an efficient sparse matrix multiplication kernel for modern GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2023

Assessing the Performance of an Architecture-Aware Optimization Tool for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

Evaluation of architecture-aware optimization techniques for Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 31st Euromicro International Conference on Parallel, 2023

Trajectory-based Metaheuristics for Improving Sparse Matrix Storage.

[BibT_eX]

[DOI]

Proceedings of the IEEE Latin American Conference on Computational Intelligence, 2023

Towards Reducing Communications in Sparse Matrix Kernels.

[BibT_eX]

[DOI]

Proceedings of the Cloud Computing, Big Data & Emerging Topics - 11th Conference, 2023

Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs.

[BibT_eX]

[DOI]

Gonzalo Berger

Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

Enhancing the Sparse Matrix Storage Using Reordering Techniques.

[BibT_eX]

[DOI]

Manuel Freire

Sanderson L. Gonzaga de Oliveira

Proceedings of the High Performance Computing - 10th Latin American Conference, 2023

2022

Kaizen Programming for predicting numerical linear algebra operations performance.

[BibT_eX]

[DOI]

Jimena Ferreira

Martín Pedemonte

Proceedings of the 2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2022

Refactoring an Electric-Market Simulation Software for Massively Parallel Computations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 9th Latin American Conference, 2022

Time-Power-Energy Balance of blas Kernels in Modern fpgas.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 9th Latin American Conference, 2022

2021

Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators.

[BibT_eX]

[DOI]

J. Supercomput., 2021

Machine learning for optimal selection of sparse triangular system solvers on GPUs.

[BibT_eX]

[DOI]

Manuel Freire

J. Parallel Distributed Comput., 2021

Energy-efficient algebra kernels in FPGA for High Performance Computing.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2021

Selecting optimal SpMV realizations for GPUs via machine learning.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

Accelerating advanced preconditioning methods on hybrid architectures.

[BibT_eX]

[DOI]

CLEI Electron. J., 2021

Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs.

[BibT_eX]

[DOI]

Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021

Optimizing Sparse Matrix Storage for the Big Data Era.

[BibT_eX]

[DOI]

Proceedings of the Cloud Computing, Big Data & Emerging Topics - 9th Conference, 2021

Towards an Efficient Sparse Storage Format for the SpMM Kernel in GPUs.

[BibT_eX]

[DOI]

Renzo Marini

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Assessing the solution of one sparse triangular linear system on multi-many core platforms.

[BibT_eX]

[DOI]

Proceedings of the XLVII Latin American Computing Conference, 2021

2020

Using analysis information in the synchronization-free GPU solution of sparse triangular systems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

Estimating the parallelism in the solution of sparse triangular linear systems.

[BibT_eX]

[DOI]

Eduardo González

Proceedings of the 39th International Conference of the Chilean Computer Science Society, 2020

Understanding the Performance of Elementary NLA Kernels in FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Exploring fpga Optimizations to Compute Sparse Numerical Linear Algebra Kernels.

[BibT_eX]

[DOI]

Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2020

2019

An efficient GPU version of the preconditioned GMRES method.

[BibT_eX]

[DOI]

J. Supercomput., 2019

Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations.

[BibT_eX]

[DOI]

Parallel Comput., 2019

A GPU-aware mixed-precision solver for low-rank algebraic Riccati equations.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

Avoiding Synchronization to Accelerate a CFD Solver in GPU.

[BibT_eX]

[DOI]

Gabriel Usera

Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Automatic Selection of Sparse Triangular Linear System Solvers on GPUs through Machine Learning Techniques.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Towards a Lightweight Method to Predict the Performance of Sparse Triangular Solvers on Heterogeneous Hardware Platforms.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

Accelerating the Calculation of Friedman Test Tables on Many-Core Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

2018

Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers.

[BibT_eX]

[DOI]

Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

Solving Sparse Triangular Linear Systems in Modern GPUs: A Synchronization-Free Algorithm.

[BibT_eX]

[DOI]

Proceedings of the 26th Euromicro International Conference on Parallel, 2018

A New GPU Algorithm to Compute a Level Set-Based Analysis for the Parallel Solution of Sparse Triangular Systems.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Extending ILUPACK with a GPU Version of the BiCGStab Method.

[BibT_eX]

[DOI]

Proceedings of the XLIV Latin American Computer Conference, 2018

2017

Assessing Sparse Triangular Linear System Solvers on GPUs.

[BibT_eX]

[DOI]

Daniel Erguiz

Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Overcoming Memory-Capacity Constraints in the Use of ILUPACK on Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Solving Sparse Differential Riccati Equations on Hybrid CPU-GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the Computational Science and Its Applications - ICCSA 2017, 2017

Evaluating the NVIDIA Tegra Processor as a Low-Power Alternative for Sparse GPU Computations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

2016

Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2016

Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms.

[BibT_eX]

[DOI]

Juan Pablo Silva

Peter Benner

CLEI Electron. J., 2016

A Data-Parallel ILUPACK for Sparse General and Symmetric Indefinite Linear Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Taking advantage of HPC techniques in the operational forecast of the Río de la Plata.

[BibT_eX]

[DOI]

Proceedings of the XLII Latin American Computing Conference, 2016

Design of a Task-Parallel Version of ILUPACK for Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - Third Latin American Conference, 2016

2015

Extending lyapack for the solution of band Lyapunov equations on hybrid CPU-GPU platforms.

[BibT_eX]

[DOI]

J. Supercomput., 2015

Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction.

[BibT_eX]

[DOI]

Peter Benner

Clust. Comput., 2015

Solving dense linear systems with hybrid ARM+GPU platforms.

[BibT_eX]

[DOI]

Juan Pablo Silva

Peter Benner

Proceedings of the 2015 Latin American Computing Conference, 2015

Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - Second Latin American Conference, 2015

2014

Another step to the full GPU implementation of the weather research and forecasting model.

[BibT_eX]

[DOI]

Alejandro Gutiérrez Arce

Gabriel Cazes

J. Supercomput., 2014

Leveraging Data-Parallelism in ILUPACK using Graphics Processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing, 2014

Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction.

[BibT_eX]

[DOI]

Proceedings of the Computational Science and Its Applications - ICCSA 2014 - 14th International Conference, Guimarães, Portugal, June 30, 2014

Accelerating the general band matrix multiplication using graphics processors.

[BibT_eX]

[DOI]

Proceedings of the XL Latin American Computing Conference, 2014

Efficient Symmetric Band Matrix-Matrix Multiplication on GPUs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - First HPCLATAM, 2014

2013

Accelerating the Lyapack library using GPUs.

[BibT_eX]

[DOI]

Ernesto Dufrechu