Pedro Alonso

Orcid: 0000-0002-6882-6592

  • Polytechnic University of Valencia, Spain

According to our database1, Pedro Alonso authored at least 101 papers between 2000 and 2025.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Acceleration of the MVS workflow using graphics processors.
J. Supercomput., January, 2025

Automatic generation of ARM NEON micro-kernels for matrix multiplication.
J. Supercomput., July, 2024

Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
ACM Trans. Math. Softw., March, 2024

Acceleration of the Pre-processing Stage of the MVS Workflow using Graphics Processors.
Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores, 2024

A caching mechanism to exploit object store speed in High Energy Physics analysis.
Clust. Comput., October, 2023

Efficient GPU implementation of a Boltzmann-Schrödinger-Poisson solver for the simulation of nanoscale DG MOSFETs.
J. Supercomput., August, 2023

Efficient and portable Winograd convolutions for multi-core processors.
J. Supercomput., July, 2023

Parallel border tracking in binary images for multicore computers.
J. Supercomput., June, 2023

Euler polynomials for the matrix exponential approximation.
J. Comput. Appl. Math., June, 2023

Leveraging an open source serverless framework for high energy physics computing.
J. Supercomput., May, 2023

Micro-kernels for portable and efficient matrix multiplication in deep learning.
J. Supercomput., May, 2023

Leveraging State-of-the-Art Engines for Large-Scale Data Analysis in High Energy Physics.
J. Grid Comput., March, 2023

Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
CoRR, 2023

Computing the Matrix Logarithm with the Romberg Integration Method.
Algorithms, 2023

Parallel signal detection for generalized spatial modulation MIMO systems.
J. Supercomput., 2022

Parallel border tracking in binary images using GPUs.
J. Supercomput., 2022

New Hermite series expansion for computing the matrix hyperbolic cosine.
J. Comput. Appl. Math., 2022

On Bernoulli matrix polynomials and matrix exponential approximation.
J. Comput. Appl. Math., 2022

Two Taylor Algorithms for Computing the Action of the Matrix Exponential on a Vector.
Algorithms, 2022

Convolution Operators for Deep Learning Inference on the Fujitsu A64FX Processor.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Performance Analysis of Convolution Algorithms for Deep Learning on Edge Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

A Serverless Engine for High Energy Physics Distributed Analysis.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors.
J. Supercomput., 2021

Efficient update of determinants for many-electron wave function overlaps.
Comput. Phys. Commun., 2021

High Performance and Energy Efficient Integer Matrix Multiplication for Deep Learning.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Distributed Parallel Analysis Engine for High Energy Physics Using AWS Lambda.
Proceedings of the HiPS@HPDC 2021: Proceedings of the 1st Workshop on High Performance Serverless Computing, 2021

Performance modeling of the sparse matrix-vector product via convolutional neural networks.
J. Supercomput., 2020

High Performance and Portable Convolution Operators for ARM-based Multicore Processors.
CoRR, 2020

A pipeline for the QR update in digital signal processing.
Comput. Math. Methods, 2020

High Performance and Portable Convolution Operators for Multicore Processors.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Real-time Soundprism.
J. Supercomput., 2019

A pipeline structure for the block QR update in digital signal processing.
J. Supercomput., 2019

HReMAS: hybrid real-time musical alignment system.
J. Supercomput., 2019

Exploring hybrid parallel systems for probabilistic record linkage.
J. Supercomput., 2019

Fast block QR update in digital signal processing.
J. Supercomput., 2019

Computing matrix trigonometric functions with GPUs through Matlab.
J. Supercomput., 2019

Fast Taylor polynomial evaluation for the computation of the matrix cosine.
J. Comput. Appl. Math., 2019

An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations.
J. Comput. Appl. Math., 2019

Online score-informed source separation in polyphonic mixtures using instrument spectral patterns.
Comput. Math. Methods, 2019

Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors.
Parallel Comput., 2018

A new efficient and accurate spline algorithm for the matrix exponential computation.
J. Comput. Appl. Math., 2018

Automatic tuning to performance modelling of matrix polynomials on multicore and multi-GPU systems.
J. Supercomput., 2017

Accelerating multi-channel filtering of audio signal on ARM processors.
J. Supercomput., 2017

An efficient musical accompaniment parallel system for mobile devices.
J. Supercomput., 2017

High-performance computing: the essential tool and the essential challenge.
J. Supercomput., 2017

Parallel online time warping for real-time audio-to-score alignment in multi-core systems.
J. Supercomput., 2017

Efficient and accurate algorithms for computing matrix trigonometric functions.
J. Comput. Appl. Math., 2017

Two algorithms for computing the matrix cosine function.
Appl. Math. Comput., 2017

Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

A fast band-Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors.
J. Comput. Phys., 2016

Implementation of the Beamformer Algorithm for the NVIDIA Jetson.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Time and energy modeling of high-performance Level-3 BLAS on x86 architectures.
Simul. Model. Pract. Theory, 2015

Solving time-invariant differential matrix Riccati equations using GPGPU computing.
J. Supercomput., 2014

Automatic routine tuning to represent landform attributes on multicore and multi-GPU systems.
J. Supercomput., 2014

Assessing Power Monitoring Approaches for Energy and Power Analysis of Computers.
Sustain. Comput. Informatics Syst., 2014

Block pivoting implementation of a symmetric Toeplitz solver.
J. Parallel Distributed Comput., 2014

Modeling power and energy of the task-parallel Cholesky factorization on multicore processors.
Comput. Sci. Res. Dev., 2014

Modeling power and energy consumption of dense matrix factorizations on multicore processors.
Concurr. Comput. Pract. Exp., 2014

Enhancing performance and energy consumption of runtime schedulers for dense linear algebra.
Concurr. Comput. Pract. Exp., 2014

A multicore solution to Block-Toeplitz linear systems of equations.
J. Supercomput., 2013

Energy-efficient execution of dense linear algebra algorithms on multi-core processors.
Clust. Comput., 2013

Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems.
Proceedings of the 2013 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2013

Solving Some Mysteries in Power Monitoring of Servers: Take Care of Your Wattmeters!
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

Runtime Scheduling of the LU Factorization: Performance and Energy.
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013

Heterogeneous Computational Model for Landform Attributes Representation on Multicore and Multi-GPU Systems.
Proceedings of the International Conference on Computational Science, 2012

DVFS-control techniques for dense linear algebra operations on multi-core processors.
Comput. Sci. Res. Dev., 2012

Solving systems of symmetric Toeplitz tridiagonal equations: Rojo's algorithm revisited.
Appl. Math. Comput., 2012

Auto-Tuning Methodology to Represent Landform Attributes on Multicore and Multi-GPU Systems.
Proceedings of the 13th Symposium on Computer Systems, 2012

Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012

Tools for Power-Energy Modelling and Analysis of Parallel Scientific Applications.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Parallel Algorithm for Landform Attributes Representation on Multicore and Multi-GPU Systems.
Proceedings of the Computational Science and Its Applications - ICCSA 2012, 2012

Implementation and tuning of a parallel symmetric Toeplitz eigensolver.
J. Parallel Distributed Comput., 2011

Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors.
Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011

Efficient Simulation of Spatio-temporal Dynamics in Ultrasonic Resonators.
Proceedings of the Advances in Computational Intelligence, 2011

Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Experimental Study of Six Different Implementations of Parallel Matrix Multiplication on Heterogeneous Computational Clusters of Multicore Processors.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

HeteroPBLAS: A Set of Parallel Basic Linear Algebra Subprograms Optimized for Heterogeneous Computational Clusters.
Scalable Comput. Pract. Exp., 2009

A GPU Approach to the Simulation of Spatio-temporal Dynamics in Ultrasonic Resonators.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Partial Data Replication as a Strategy for Parallel Computing of the Multilevel Discrete Wavelet Transform.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

Parallel solvers for dense linear systems for heterogeneous computational clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A multilevel parallel algorithm to solve symmetric Toeplitz linear systems.
J. Supercomput., 2008

Parallel computation of the eigenvalues of symmetric Toeplitz matrices through iterative methods.
J. Parallel Distributed Comput., 2008

Using Laplace and angular measures for Feature Selection in Text Categorisation.
Int. J. Adv. Intell. Paradigms, 2008

A Threaded Divide and Conquer Symmetric Tridiagonal Eigensolver on Multicore Systems.
Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC 2008), 2008

Heterogeneous PBLAS: Optimization of PBLAS for Heterogeneous Computational Clusters.
Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC 2008), 2008

Scalable Dense Factorizations for Heterogeneous Computational Clusters.
Proceedings of the 7th International Symposium on Parallel and Distributed Computing (ISPDC 2008), 2008

A Pipelined Parallel Algorithm for OSIC Decoding.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

An Adaptive Interface for the Efficient Computation of the Discrete Sine Transform.
Proceedings of the Parallel Processing and Applied Mathematics, 2007

A Parallel Solution of Hermitian Toeplitz Linear Systems, .
Proceedings of the Computational Science, 2006

Sequential and Parallel Algorithms for the Inverse Toeplitz Singular Value Problem.
Proceedings of the 2006 International Conference on Scientific Computing, 2006

A Parallel Algorithm for the Solution of the Deconvolution Problem on Heterogeneous Networks.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

An Efficient Parallel Algorithm to Solve Block-Toeplitz Systems.
J. Supercomput., 2005

Solving the block-Toeplitz least-squares problem in parallel.
Concurr. Pract. Exp., 2005

An Efficient Parallel Solution of Complex Toeplitz Linear Systems, .
Proceedings of the Parallel Processing and Applied Mathematics, 2005

The Symmetric-Toeplitz Linear System Problem in Parallel.
Proceedings of the Computational Science, 2005

Designing polylibraries to speed up linear algebra computations.
Int. J. High Perform. Comput. Netw., 2004

An Efficient and Stable Parallel Solution for Non-symmetric Toeplitz Linear Systems.
Proceedings of the High Performance Computing for Computational Science, 2004

Parallel Algorithms for the Solution of Toeplitz Systems of Linear Equations.
Proceedings of the Parallel Processing and Applied Mathematics, 2003

A Parallel Algorithm for Solving the Toeplitz Least Squares Problem.
Proceedings of the Vector and Parallel Processing, 2000
