Jakub Kurzak
Orcid: 0000-0002-9697-0145
According to our database1,
Jakub Kurzak
authored at least 84 papers
between 2005 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Breaking the Million-Electron and 1 EFLOP/s Barriers: Biomolecular-Scale Ab Initio Molecular Dynamics Using MP2 Potentials.
Proceedings of the International Conference for High Performance Computing, 2024
2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
2021
ACM Trans. Math. Softw., 2021
2019
ACM Trans. Math. Softw., 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 48th International Conference on Parallel Processing, 2019
Proceedings of the Euro-Par 2019: Parallel Processing, 2019
2018
IEEE Trans. Parallel Distributed Syst., 2018
Supercomput. Front. Innov., 2018
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018
Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators.
Proc. IEEE, 2018
Int. J. Comput. Sci. Eng., 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
2017
Design and Implementation of the PULSAR Programming System for Large Scale Computing.
Supercomput. Front. Innov., 2017
Int. J. Parallel Program., 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
Proceedings of the Handbook of Big Data Technologies, 2017
2016
Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs.
IEEE Trans. Parallel Distributed Syst., 2016
Acta Numer., 2016
Proceedings of the High Performance Computing, 2016
Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks.
Proceedings of the 2nd Workshop on Machine Learning in HPC Environments, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
2015
Supercomput. Front. Innov., 2015
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.
Proceedings of the International Conference for High Performance Computing, 2015
Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
2014
Supercomput. Front. Innov., 2014
Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Parallel Process. Lett., 2014
Proceedings of the Second IEEE Working Conference on Software Visualization, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
IEEE Trans. Parallel Distributed Syst., 2013
An improved parallel singular value algorithm and its implementation for multicore hardware.
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013
2012
IEEE Trans. Parallel Distributed Syst., 2012
Proceedings of the High Performance Computing for Computational Science, 2012
Proceedings of the Transition of HPC Towards Exascale Computing, 2012
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012
2011
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
2010
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2010
Scheduling two-sided transformations using tile algorithms on multicore architectures.
Sci. Program., 2010
Concurr. Comput. Pract. Exp., 2010
Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
Proceedings of the Applied Parallel and Scientific Computing, 2010
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
2009
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor.
Parallel Comput., 2009
Parallel Comput., 2009
Comput. Phys. Commun., 2009
2008
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization.
IEEE Trans. Parallel Distributed Syst., 2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008
Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics.
Int. J. High Perform. Comput. Appl., 2008
Concurr. Comput. Pract. Exp., 2008
Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic Processing Element of the CELL Processor.
Proceedings of the Computational Science, 2008
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008
2007
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems.
Int. J. High Perform. Comput. Appl., 2007
Implementation of mixed precision in solving systems of linear equations on the Cell processor.
Concurr. Comput. Pract. Exp., 2007
Introduction to Programming High Performance Applications on the CELL Broadband Engine.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007
2006
Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems).
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Poster reception - Targeting multi-core architectures for linear algebra applications.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006
2005
Massively parallel implementation of a fast multipole method for distributed memory machines.
J. Parallel Distributed Comput., 2005