Sandra Catalán

Orcid: 0000-0002-9321-2728

According to our database1, Sandra Catalán authored at least 50 papers between 2013 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Mixed-precision pre-pivoting strategy for the LU factorization.
J. Supercomput., January, 2025

2024
Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.
J. Syst. Archit., 2024

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
Int. J. High Perform. Comput. Appl., 2024

Inference with Transformer Encoders on ARM and RISC-V Multicore Processors.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

2023
Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures.
J. Parallel Distributed Comput., May, 2023

Co-Design of the Dense Linear AlgebravSoftware Stack for Multicore Processors.
CoRR, 2023

Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs.
Concurr. Comput. Pract. Exp., 2023

Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022
QR Factorization Using Malleable BLAS on Multicore Processors.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

2021
Leveraging teaching on demand: Approaching HPC to undergrads.
J. Parallel Distributed Comput., 2021

A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs.
Proceedings of the PMAM@PPoPP 2021: Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores, 2021

Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library).
J. Parallel Distributed Comput., 2020

Programming parallel dense matrix factorizations with look-ahead and OpenMP.
Clust. Comput., 2020

Towards an Auto-Tuned and Task-Based SpMV (LASs Library).
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019
Dynamic look-ahead in the reduction to band form for the singular value decomposition.
Parallel Comput., 2019

Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD.
Numer. Algorithms, 2019

A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting.
IEEE Access, 2019

Teaching on Demand: an HPC Experience.
Proceedings of the 2019 IEEE/ACM Workshop on Education for High-Performance Computing, 2019

BLAS-3 Optimized by OmpSs Regions (LASs Library).
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Tasking in Accelerators: Performance Evaluation.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

Accelerating Conjugate Gradient using OmpSs.
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

2018
Multithreaded Dense Linear Algebra on Asymmetric Multi-core Processors.
PhD thesis, 2018

Static scheduling of the LU factorization with look-ahead on asymmetric multicore processors.
Parallel Comput., 2018

Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures.
Parallel Comput., 2018

Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors.
Parallel Comput., 2018

Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors.
J. Comput. Sci., 2018

Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

2017
Time and energy modeling of a high-performance multi-threaded Cholesky factorization.
J. Supercomput., 2017

Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations.
Parallel Comput., 2017

Two-Sided Reduction to Compact Band Forms with Look-Ahead.
CoRR, 2017

Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Static Versus Dynamic Task Scheduling of the Lu Factorization on ARM big. LITTLE Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
An analytical methodology to derive power models based on hardware and software metrics.
Comput. Sci. Res. Dev., 2016

Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics.
IET Comput. Digit. Tech., 2016

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors.
Clust. Comput., 2016

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Time and energy modeling of high-performance Level-3 BLAS on x86 architectures.
Simul. Model. Pract. Theory, 2015

Evaluating the performance and energy efficiency of the COSMO-ART model system.
Comput. Sci. Res. Dev., 2015

Reducing the cost of power monitoring with DC wattmeters.
Comput. Sci. Res. Dev., 2015

Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors.
CoRR, 2015

Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors.
CoRR, 2015

Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

2014
Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers.
Sustain. Comput. Informatics Syst., 2014

Automatic detection of power bottlenecks in parallel scientific applications.
Comput. Sci. Res. Dev., 2014

Analyzing the Energy Efficiency of the Memory Subsystem in Multicore Processors.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014

2013
Solving Some Mysteries in Power Monitoring of Servers: Take Care of Your Wattmeters!
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013


  Loading...