Enrique S. Quintana-Ortí
Orcid: 0000-0002-5454-165XAffiliations:
- Jaume I University, Castellón de la Plana, Spain
According to our database1,
Enrique S. Quintana-Ortí
authored at least 389 papers
between 1993 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on scopus.com
-
on orcid.org
-
on upv.es
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
J. Supercomput., July, 2024
J. Supercomput., June, 2024
Algorithm 1039: Automatic Generators for a Family of Matrix Multiplication Routines with Apache TVM.
ACM Trans. Math. Softw., March, 2024
Hard SyDR: A Benchmarking Environment for Global Navigation Satellite System Algorithms.
Sensors, January, 2024
Microprocess. Microsystems, 2024
Parallel GEMM-based convolutions for deep learning on multicore ARM and RISC-V architectures.
J. Syst. Archit., 2024
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors.
Int. J. High Perform. Comput. Appl., 2024
Parallel Reduced Order Modeling for Digital Twins using High-Performance Computing Workflows.
CoRR, 2024
Mapping Parallel Matrix Multiplication in GotoBLAS2 to the AMD Versal ACAP for Deep Learning.
CoRR, 2024
Acceleration of the Pre-processing Stage of the MVS Workflow using Graphics Processors.
Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores, 2024
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
2023
J. Supercomput., July, 2023
Int. J. High Perform. Comput. Appl., July, 2023
J. Supercomput., May, 2023
Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures.
J. Parallel Distributed Comput., May, 2023
Analyzing the impact of the MPI allreduce in distributed training of convolutional neural networks.
Computing, May, 2023
Int. J. High Perform. Comput. Appl., March, 2023
Reformulating the direct convolution for high-performance deep learning inference on ARM processors.
J. Syst. Archit., February, 2023
Using Ginkgo's memory accessor for improving the accuracy of memory-bound low precision BLAS.
Softw. Pract. Exp., 2023
GreenLightningAI: An Efficient AI System with Decoupled Structural and Quantitative Knowledge.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Fine-grain task-parallel algorithms for matrix factorizations and inversion on many-threaded CPUs.
Concurr. Comput. Pract. Exp., 2023
Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors.
Concurr. Comput. Pract. Exp., 2023
Proceedings of the High Performance Computing, 2023
Automatic Generation of Micro-kernels for Performance Portability of Matrix Multiplication on RISC-V Vector Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the 31st Euromicro International Conference on Parallel, 2023
Proceedings of the International Conference on Localization and GNSS, 2023
Tall-and-Skinny QR Factorization for Clusters of GPUs Using High-Performance Building Blocks.
Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023
2022
ACM Trans. Math. Softw., 2022
A BLIS-like matrix multiplication for machine learning in the RISC-V ISA-based GAP8 processor.
J. Supercomput., 2022
High performance and energy efficient inference for deep learning on multicore ARM processors using general optimization techniques and BLIS.
J. Syst. Archit., 2022
Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors.
J. Parallel Distributed Comput., 2022
J. Comput. Sci., 2022
Int. J. High Perform. Comput. Appl., 2022
Enabling dynamic and intelligent workflows for HPC, data analytics, and AI convergence.
Future Gener. Comput. Syst., 2022
Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units.
Concurr. Comput. Pract. Exp., 2022
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022
NUMA-Aware Dense Matrix Factorizations and Inversion with Look-Ahead on Multicore Processors.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022
Proceedings of the Parallel Processing and Applied Mathematics, 2022
Towards Portable Realizations of Winograd-based Convolution with Vector Intrinsics and OpenMP.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022
2021
Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.
ACM Trans. Math. Softw., 2021
Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors.
J. Supercomput., 2021
Factorized solution of generalized stable Sylvester equations using many-core GPU accelerators.
J. Supercomput., 2021
J. Supercomput., 2021
IEEE Trans. Computers, 2021
J. Parallel Distributed Comput., 2021
Int. J. High Perform. Comput. Appl., 2021
Introduction to the Special Issue related to the Power-Aware Computing Workshop 2019 - PACO 2019.
Int. J. High Perform. Comput. Appl., 2021
Comput. Phys. Commun., 2021
CoRR, 2021
Clust. Comput., 2021
A New Generation of Task-Parallel Algorithms for Matrix Inversion in Many-Threaded CPUs.
Proceedings of the PMAM@PPoPP 2021: Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores, 2021
High Performance and Energy Efficient Integer Matrix Multiplication for Deep Learning.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021
Evaluation of MPI Allreduce for Distributed Training of Convolutional Neural Networks.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021
Proceedings of the 29th Euromicro International Conference on Parallel, 2021
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
2020
ACM Trans. Parallel Comput., 2020
Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors.
J. Supercomput., 2020
J. Supercomput., 2020
Performance modeling of the sparse matrix-vector product via convolutional neural networks.
J. Supercomput., 2020
IEEE Trans. Computers, 2020
J. Comput. Appl. Math., 2020
Reproducibility of parallel preconditioned conjugate gradient in hybrid programming environments.
Int. J. High Perform. Comput. Appl., 2020
Reproducibility of Parallel Preconditioned Conjugate Gradient in Hybrid Programming Environments.
CoRR, 2020
High Performance and Portable Convolution Operators for ARM-based Multicore Processors.
CoRR, 2020
Clust. Comput., 2020
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the Euro-Par 2020: Parallel Processing, 2020
Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020
2019
ACM Trans. Math. Softw., 2019
J. Supercomput., 2019
J. Supercomput., 2019
Dynamic look-ahead in the reduction to band form for the singular value decomposition.
Parallel Comput., 2019
Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors.
Parallel Comput., 2019
Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations.
Parallel Comput., 2019
Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD.
Numer. Algorithms, 2019
Erratum to "Exploiting nested task-parallelism in theH-LU factorization" [J. Comput. Sci. 33 (2019) 20-33].
J. Comput. Sci., 2019
Int. J. High Perform. Comput. Appl., 2019
Int. J. High Perform. Comput. Appl., 2019
Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers.
Concurr. Comput. Pract. Exp., 2019
A Case for Malleable Thread-Level Linear Algebra Libraries: The LU Factorization With Partial Pivoting.
IEEE Access, 2019
Automatic Selection of Sparse Triangular Linear System Solvers on GPUs through Machine Learning Techniques.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019
Proceedings of the 26th European MPI Users' Group Meeting, 2019
Structure-Aware Calculation of Many-Electron Wave Function Overlaps on Multicore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2019
Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2019
Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
2018
Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models.
J. Supercomput., 2018
Optimized Fundamental Signal Processing Operations For Energy Minimization on Heterogeneous Mobile Devices.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018
Parallel Comput., 2018
Static scheduling of the LU factorization with look-ahead on asymmetric multicore processors.
Parallel Comput., 2018
Energy balance between voltage-frequency scaling and resilience for linear algebra routines on low-power multicore architectures.
Parallel Comput., 2018
Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors.
Parallel Comput., 2018
Multi-threaded dense linear algebra libraries for low-power asymmetric multicore processors.
J. Comput. Sci., 2018
J. Comput. Biol., 2018
Int. J. High Perform. Comput. Appl., 2018
Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems.
Proceedings of the High Performance Computing, 2018
High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation.
Proceedings of the 8th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2018
Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018
Proceedings of the 26th Euromicro International Conference on Parallel, 2018
Proceedings of the XLIV Latin American Computer Conference, 2018
2017
Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures.
J. Supercomput., 2017
Time and energy modeling of a high-performance multi-threaded Cholesky factorization.
J. Supercomput., 2017
J. Supercomput., 2017
J. Supercomput., 2017
Adapting concurrency throttling and voltage-frequency scaling for dense eigensolvers.
J. Supercomput., 2017
GPU-Based Dynamic Wave Field Synthesis Using Fractional Delay Filters and Room Compensation.
IEEE ACM Trans. Audio Speech Lang. Process., 2017
Revisiting conventional task schedulers to exploit asymmetry in multi-core architectures for dense linear algebra operations.
Parallel Comput., 2017
Architecture-aware optimization of an HEVC decoder on asymmetric multicore processors.
J. Real Time Image Process., 2017
Extending the Gauss-Huard method for the solution of Lyapunov matrix equations and matrix inversion.
Concurr. Comput. Pract. Exp., 2017
Concurr. Comput. Pract. Exp., 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017
Reduction to Tridiagonal Form for Symmetric Eigenproblems on Asymmetric Multicore Processors.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Static Versus Dynamic Task Scheduling of the Lu Factorization on ARM big. LITTLE Architectures.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017
Proceedings of the 46th International Conference on Parallel Processing, 2017
Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning.
Proceedings of the 46th International Conference on Parallel Processing, 2017
Proceedings of the Computational Science and Its Applications - ICCSA 2017, 2017
Proceedings of the International Conference on Computational Science, 2017
On the Use of a GPU-Accelerated Mobile Device Processor for Sound Source Localization.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the Algorithms and Architectures for Parallel Processing, 2017
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
Evaluating the NVIDIA Tegra Processor as a Low-Power Alternative for Sparse GPU Computations.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017
Proceedings of the Third IEEE International Conference on Multimedia Big Data, 2017
2016
ACM Trans. Math. Softw., 2016
Exploiting task and data parallelism in ILUPACK's preconditioned CG solver on NUMA architectures and many-core accelerators.
Parallel Comput., 2016
A fast band-Krylov eigensolver for macromolecular functional motion simulation on multicore architectures and graphics processors.
J. Comput. Phys., 2016
Characterizing the efficiency of multicore and manycore processors for the solution of sparse linear systems.
Comput. Sci. Res. Dev., 2016
Evaluating fault tolerance on asymmetric multicore systems-on-chip using iso-metrics.
IET Comput. Digit. Tech., 2016
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors.
Clust. Comput., 2016
Balancing Energy and Performance in Dense Linear System Solvers for Hybrid ARM+GPU platforms.
CLEI Electron. J., 2016
Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
The Impact of Panel Factorization on the Gauss-Huard Algorithm for the Solution of Linear Systems on Modern Architectures.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016
Tuning the Blocksize for Dense Linear Algebra Factorization Routines with the Roofline Model.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016
The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016
Exploiting Task-Parallelism in Message-Passing Sparse Linear System Solvers Using OmpSs.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Proceedings of the CLOSER 2016, 2016
Proceedings of the High Performance Computing - Third Latin American Conference, 2016
2015
Exploring the performance-power-energy balance of low-power multicore and manycore architectures for anomaly detection in remote sensing.
J. Supercomput., 2015
Extending lyapack for the solution of band Lyapunov equations on hybrid CPU-GPU platforms.
J. Supercomput., 2015
IEEE ACM Trans. Comput. Biol. Bioinform., 2015
Systematic derivation of time and power models for linear algebra kernels on multicore architectures.
Sustain. Comput. Informatics Syst., 2015
Simul. Model. Pract. Theory, 2015
IEEE Geosci. Remote. Sens. Lett., 2015
Comput. Sci. Res. Dev., 2015
Revisiting Conventional Task Schedulers to Exploit Asymmetry in ARM big.LITTLE Architectures for Dense Linear Algebra.
CoRR, 2015
Performance and Energy Optimization of Matrix Multiplication on Asymmetric big.LITTLE Processors.
CoRR, 2015
Multi-Threaded Dense Linear Algebra Libraries for Low-Power Asymmetric Multicore Processors.
CoRR, 2015
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors.
Concurr. Comput. Pract. Exp., 2015
Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction.
Clust. Comput., 2015
Balancing task- and data-level parallelism to improve performance and energy consumption of matrix computations on the Intel Xeon Phi.
Comput. Electr. Eng., 2015
Proceedings of the 2015 Visual Communications and Image Processing, 2015
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, 2015
Revisiting the Gauss-Huard Algorithm for the Solution of Linear Systems on Graphics Accelerators.
Proceedings of the Parallel Processing and Applied Mathematics, 2015
A Parallel Multi-threaded Solver for Symmetric Positive Definite Bordered-Band Linear Systems.
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Performance and Fault Tolerance of Preconditioned Iterative Solvers on Low-Power ARM Architectures.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Evaluating the Potential of Low Power Systems for Headphone-based Spatial Audio Applications.
Proceedings of the International Conference on Computational Science, 2015
Real-time Sound Source Localization on an Embedded GPU Using a Spherical Microphone Array.
Proceedings of the International Conference on Computational Science, 2015
Proceedings of the 23rd European Signal Processing Conference, 2015
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 2015 Latin American Computing Conference, 2015
Solving Linear Systems on the Intel Xeon-Phi Accelerator via the Gauss-Huard Algorithm.
Proceedings of the High Performance Computing - Second Latin American Conference, 2015
2014
Sustain. Comput. Informatics Syst., 2014
Assessing the Performance-Energy Balance of Graphics Processors for Spectral Unmixing.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2014
Efficient Implementation of Hyperspectral Anomaly Detection Techniques on GPUs and Multicore Processors.
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2014
IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens., 2014
Improved Accuracy and Parallelism for MRRR-Based Eigensolvers - A Mixed Precision Approach.
SIAM J. Sci. Comput., 2014
Parallel Comput., 2014
Leveraging task-parallelism in message-passing dense matrix factorizations using SMPSs.
Parallel Comput., 2014
A factored variant of the Newton iteration for the solution of algebraic Riccati equations via the matrix sign function.
Numer. Algorithms, 2014
Comput. Sci. Res. Dev., 2014
Modeling power and energy of the task-parallel Cholesky factorization on multicore processors.
Comput. Sci. Res. Dev., 2014
Modeling power and energy consumption of dense matrix factorizations on multicore processors.
Concurr. Comput. Pract. Exp., 2014
Enhancing performance and energy consumption of runtime schedulers for dense linear algebra.
Concurr. Comput. Pract. Exp., 2014
Assessing the impact of the CPU power-saving modes on the task-parallel solution of sparse linear systems.
Clust. Comput., 2014
Trading Off Performance for Energy in Linear Algebra Operations with Applications in Control Theory.
CLEI Electron. J., 2014
Proceedings of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing, 2014
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2014
Performance and Energy-Aware Characterization of the Sparse Matrix-Vector Multiplication on Multithreaded Architectures.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014
Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction.
Proceedings of the Computational Science and Its Applications - ICCSA 2014 - 14th International Conference, Guimarães, Portugal, June 30, 2014
Parallel performance and energy efficiency of modern video encoders on multithreaded architectures.
Proceedings of the 22nd European Signal Processing Conference, 2014
Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
Proceedings of the CLOSER 2014, 2014
Proceedings of the XL Latin American Computing Conference, 2014
Proceedings of the High Performance Computing - First HPCLATAM, 2014
2013
Exploring large macromolecular functional motions on clusters of multicore processors.
J. Comput. Phys., 2013
Performance versus energy consumption of hyperspectral unmixing algorithms on multi-core platforms.
EURASIP J. Adv. Signal Process., 2013
Concurr. Comput. Pract. Exp., 2013
Concurr. Comput. Pract. Exp., 2013
Energy-efficient execution of dense linear algebra algorithms on multi-core processors.
Clust. Comput., 2013
Proceedings of the 20th European MPI Users's Group Meeting, 2013
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Exploiting Data- and Task-Parallelism in the Solution of Riccati Equations on Multicore Servers and GPUs.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013
Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013
On the Impact of Optimization on the Time-Power-Energy Balance of Dense Linear Algebra Factorizations.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013
Proceedings of the Energy Efficiency in Large Scale Distributed Systems, 2013
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures.
ACM Trans. Math. Softw., 2012
ACM SIGOPS Oper. Syst. Rev., 2012
Parallel Computation of 3-D Soil-Structure Interaction in Time Domain with a Coupled FEM/SBFEM Approach.
J. Sci. Comput., 2012
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations.
J. Parallel Distributed Comput., 2012
Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors.
Comput. Sci. Res. Dev., 2012
DVFS-control techniques for dense linear algebra operations on multi-core processors.
Comput. Sci. Res. Dev., 2012
Appl. Math. Comput., 2012
Applying OOC Techniques in the Reduction to Condensed Form for Very Large Symmetric Eigenproblems on GPUs.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012
Analysis of Strategies to Save Energy for Message-Passing Dense Linear Algebra Kernels.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012
Saving Energy in the LU Factorization with Partial Pivoting on Multi-core Processors.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012
Reducing Energy Consumption of Dense Linear Algebra Operations on Hybrid CPU-GPU Platforms.
Proceedings of the 10th IEEE International Symposium on Parallel and Distributed Processing with Applications, 2012
Proceedings of the ICT as Key Technology against Global Warming, 2012
Proceedings of the 41st International Conference on Parallel Processing, 2012
Proceedings of the 19th International Conference on High Performance Computing, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
2011
J. Supercomput., 2011
J. Supercomput., 2011
J. Supercomput., 2011
A mixed-precision algorithm for the solution of Lyapunov equations on hybrid CPU-GPU platforms.
Parallel Comput., 2011
Exploiting thread-level parallelism in the iterative solution of sparse linear systems.
Parallel Comput., 2011
IEEE Geosci. Remote. Sens. Lett., 2011
Large-scale linear system solver using secondary storage: Self-energy in hybrid nanostructures.
Comput. Phys. Commun., 2011
Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures.
Concurr. Comput. Pract. Exp., 2011
Appl. Math. Comput., 2011
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Proceedings of the 19th International Euromicro Conference on Parallel, 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Power-aware Dense Linear Algebra Implementations on Multi-core and Many-core Processors.
Proceedings of the 3rd Many-core Applications Research Community (MARC) Symposium. Proceedings of the 3rd MARC Symposium, 2011
Evaluation of the Energy Performance of Dense Linear Algebra Kernels on Multi-core and Many-Core Processors.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Power Consumption of Mixed Precision in the Iterative Solution of Sparse Linear Systems.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011
Improving power efficiency of dense linear algebra algorithms on multi-core processors via slack control.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the Computational Science and Its Applications - ICCSA 2011, 2011
Proceedings of the 18th International Conference on High Performance Computing, 2011
Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011
2010
Proceedings of the Applied Parallel and Scientific Computing, 2010
Parallelization of Multilevel ILU Preconditioners on Distributed-Memory Multiprocessors.
Proceedings of the Applied Parallel and Scientific Computing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
Proceedings of the Architecture of Computing Systems, 2010
2009
ACM Trans. Math. Softw., 2009
Int. J. Parallel Emergent Distributed Syst., 2009
Parallel solution of large-scale algebraic Bernoulli equations with the matrix sign function method.
Int. J. Comput. Sci. Eng., 2009
Concurr. Comput. Pract. Exp., 2009
Concurr. Comput. Pract. Exp., 2009
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
Reduction to Condensed Forms for Symmetric Eigenvalue Problems on Multi-core Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2009
Evaluation of Parallel Sparse Matrix Partitioning Software for Parallel Multilevel ILU Preconditioning on Shared-Memory Multiprocessors.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009
Proceedings of the Eighth International Symposium on Parallel and Distributed Computing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
Proceedings of the Euro-Par 2009, 2009
Using Hybrid CPU-GPU Platforms to Accelerate the Computation of the Matrix Sign Function.
Proceedings of the Euro-Par 2009, 2009
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
2008
Optim. Methods Softw., 2008
Proceedings of the High Performance Computing for Computational Science, 2008
Attaining High Performance in General-Purpose Computations on Current Graphics Processors.
Proceedings of the High Performance Computing for Computational Science, 2008
Proceedings of the High Performance Computing for Computational Science, 2008
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Proceedings of the 16th Euromicro International Conference on Parallel, 2008
Design of scalable dense linear algebra libraries for multithreaded architectures: the LU factorization.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the Euro-Par 2008, 2008
2007
Efficient algorithms for generalized algebraic Bernoulli equations based on the matrix sign function.
Numer. Algorithms, 2007
Stabilizing large-scale generalized systems on parallel computers using multithreading and message-passing.
Concurr. Comput. Pract. Exp., 2007
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007
Proceedings of the Parallel Processing and Applied Mathematics, 2007
Proceedings of the Parallel Processing and Applied Mathematics, 2007
Strategies for Parallelizing the Solution of Rational Matrix Equations.
Proceedings of the Parallel Computing: Architectures, 2007
Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors.
Proceedings of the Parallel Computing: Architectures, 2007
Proceedings of the Large-Scale Scientific Computing, 6th International Conference, 2007
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007
2006
J. Sci. Comput., 2006
Proceedings of the 14th Euromicro International Conference on Parallel, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Specialized Spectral Division Algorithms for Generalized Eigenproblems Via the Inverse-Free Iteration.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Parallel and Distributed Processing and Applications, 2006
Proceedings of the High Performance Computing and Communications, 2006
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006
2005
Representing linear algebra algorithms in code: the FLAME application program interfaces.
ACM Trans. Math. Softw., 2005
ACM Trans. Math. Softw., 2005
Int. J. Comput. Sci. Eng., 2005
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005
Parallelization of GSL on Clusters of Symmetric Multiprocessors.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005
Parallel Order Reduction via Balanced Truncation for Optimal Cooling of Steel Profiles.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005
Proceedings of the 44th IEEE IEEE Conference on Decision and Control and 8th European Control Conference Control, 2005
2004
Math. Comput., 2004
Proceedings of the High Performance Computing for Computational Science, 2004
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Proceedings of the 2004 International Conference on Parallel Computing in Electrical Engineering (PARELEC 2004), 2004
Proceedings of the Applied Parallel Computing, 2004
Proceedings of the Applied Parallel Computing, 2004
Proceedings of the Applied Parallel Computing, 2004
Proceedings of the Applied Parallel Computing, 2004
Proceedings of the 3rd International Symposium on Parallel and Distributed Computing (ISPDC 2004), 2004
Proceedings of the 43rd IEEE Conference on Decision and Control, 2004
2003
ACM Trans. Math. Softw., 2003
Parallel Comput., 2003
Int. J. Syst. Sci., 2003
Parallel Model Reduction of Large-Scale Unstable Systems.
Proceedings of the Parallel Computing: Software Technology, 2003
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
2002
SIAM J. Sci. Comput., 2002
Parallel Algorithms Appl., 2002
J. Parallel Distributed Comput., 2002
Proceedings of the High Performance Computing for Computational Science, 2002
Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002
Proceedings of the Euro-Par 2002, 2002
2001
J. Parallel Distributed Comput., 2001
Concurr. Comput. Pract. Exp., 2001
Proceedings of the 30th International Workshops on Parallel Processing (ICPP 2001 Workshops), 2001
Proceedings of the 2001 International Conference on Dependable Systems and Networks (DSN 2001) (formerly: FTCS), 2001
2000
J. Supercomput., 2000
Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search.
Parallel Comput., 2000
Parallel Spectral Division Using the Matrix Sign Function for the Generalized Eigenproblem.
Int. J. High Speed Comput., 2000
Proceedings of the Vector and Parallel Processing, 2000
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
1999
Parallel Process. Lett., 1999
Numer. Algorithms, 1999
Fast Parallel Kernels for Selected Problems in Control Theory.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999
1998
SIAM J. Sci. Comput., 1998
Autom., 1998
A Portable Subroutine Library for Solving Linear Control Problems on Distributed Memory Computers.
Proceedings of the Workshop on Wide Area Networks and High Performance Computing, 1998
1996
Solving Discrete-Time Lyapunov Equations for the Cholesky Factor on a Shared Memory Multiprocessor.
Parallel Process. Lett., 1996
Proceedings of the Vector and Parallel Processing, 1996
1995
A Parallel Triangular Sylvester Equation Solver Based on the Hessenberg-schur Method.
Parallel Algorithms Appl., 1995
An Efficient Parallel Sylvester Equation Solver Based on the Hessenberg-schur Method.
Parallel Algorithms Appl., 1995
1993
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993