Piotr Luszczek
Orcid: 0000-0002-0089-6965Affiliations:
- University of Tennessee, Knoxville, TN, USA
According to our database1,
Piotr Luszczek
authored at least 158 papers
between 1998 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload.
Int. J. High Perform. Comput. Appl., 2024
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications.
Future Gener. Comput. Syst., 2024
What is Normal? A Big Data Observational Science Model of Anonymized Internet Traffic.
CoRR, 2024
2023
Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering.
Int. J. High Perform. Comput. Appl., July, 2023
Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software.
CoRR, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Towards the FAIR Asset Tracking Across Models, Datasets, and Performance Evaluation Scenarios.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
Int. J. Parallel Emergent Distributed Syst., 2022
Comput. Sci. Eng., 2022
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022
Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices<sup>1</sup>.
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022
High-Performance GMRES Multi-Precision Benchmark: Design, Performance, and Challenges.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022
Deep Gaussian process with multitask and transfer learning for performance optimization.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022
2021
ACM Trans. Math. Softw., 2021
Int. J. High Perform. Comput. Appl., 2021
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
2020
Software for Linear Algebra Targeting Exascale (SLATE) with a Recursive Butterfly Transform based solver.
Dataset, August, 2020
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020), 2020
2019
ACM Trans. Math. Softw., 2019
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019
Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
2018
Supercomput. Front. Innov., 2018
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018
Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators.
Proc. IEEE, 2018
Int. J. Comput. Sci. Eng., 2018
2017
Design and Implementation of the PULSAR Programming System for Large Scale Computing.
Supercomput. Front. Innov., 2017
Int. J. Parallel Program., 2017
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
Proceedings of the Handbook of Big Data Technologies, 2017
2016
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems.
Int. J. High Perform. Comput. Appl., 2016
Acta Numer., 2016
Proceedings of the High Performance Computing, 2016
Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks.
Proceedings of the 2nd Workshop on Machine Learning in HPC Environments, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
2015
Supercomput. Front. Innov., 2015
Sci. Program., 2015
Int. J. High Perform. Comput. Appl., 2015
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015
Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.
Proceedings of the International Conference for High Performance Computing, 2015
Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015
Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
2014
Supercomput. Front. Innov., 2014
Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Parallel Process. Lett., 2014
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.
Concurr. Comput. Pract. Exp., 2014
Comput. J., 2014
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Proceedings of the International Workshop on OpenCL, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
IEEE Trans. Parallel Distributed Syst., 2013
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.
ACM Trans. Math. Softw., 2013
J. Comput. Sci., 2013
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013
Proceedings of the International Conference for High Performance Computing, 2013
An improved parallel singular value algorithm and its implementation for multicore hardware.
Proceedings of the International Conference for High Performance Computing, 2013
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013
2012
SIGMETRICS Perform. Evaluation Rev., 2012
Proceedings of the International Conference on Computational Science, 2012
Proceedings of the International Conference on Computational Science, 2012
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming.
Parallel Comput., 2012
Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.
Comput. Sci. Res. Dev., 2012
Proceedings of the High Performance Computing for Computational Science, 2012
A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012
Proceedings of the Transition of HPC Towards Exascale Computing, 2012
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012
2011
High performance matrix inversion based on LU factorization for multicore architectures.
Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011
Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling.
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
2010
Proceedings of the 39th International Conference on Parallel Processing, 2010
2009
Comput. Phys. Commun., 2009
2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008
DARPA's HPCS Program- History, Models, Tools, Languages.
Adv. Comput., 2008
2007
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007
High Performance Development for High End Computing With Python Language Wrapper (PLW).
Int. J. High Perform. Comput. Appl., 2007
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems.
Int. J. High Perform. Comput. Appl., 2007
2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems).
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006
2004
Design of Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations.
Proceedings of the Computational Science, 2004
Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), 2004
2003
Parallel Comput., 2003
Proceedings of the Computational Science - ICCS 2003, 2003
2001
Future Gener. Comput. Syst., 2001
2000
Proceedings of the High-Performance Computing and Networking, 8th International Conference, 2000
1999
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999
1998
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1998