Hartwig Anzt

Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022

Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning.

[BibT_eX]

[DOI]

Yannick Funk

Markus Götz

Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing, 2022

Mixed Precision Algebraic Multigrid on GPUs.

[BibT_eX]

[DOI]

Yu-Hsiang Mike Tsai

Natalie Beams

Proceedings of the Parallel Processing and Applied Mathematics, 2022

Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Lightning Talks of EduHPC 2022.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022

2021

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2021

Crediting pull requests to open source research software as an academic contribution.

[BibT_eX]

[DOI]

Eileen Kuehn

J. Comput. Sci., 2021

Evaluating asynchronous Schwarz solvers on GPUs.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

Porting a sparse linear algebra math library to Intel GPUs.

[BibT_eX]

[DOI]

CoRR, 2021

A Fresh Look at FAIR for Research Software.

[BibT_eX]

[DOI]

CoRR, 2021

Batched Sparse Iterative Solvers for Computational Chemistry Simulations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021

A Collaborative Peer Review Process for Grading Coding Assignments.

[BibT_eX]

[DOI]

Fritz Göbel

Proceedings of the Computational Science - ICCS 2021, 2021

Porting Sparse Linear Algebra to Intel GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

The Research Software Alliance (ReSA) and the community landscape.

[BibT_eX]

[DOI]

Daniel S. Katz

Michelle Barker

Paula Andrea Martínez

Alejandra N. González-Beltrán

Tom Bakker

Dataset, March, 2020

Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

Load-balancing Sparse Matrix Vector Product Kernels on GPUs.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2020

Parallel selection on GPUs.

[BibT_eX]

[DOI]

Tobias Ribizel

Parallel Comput., 2020

Ginkgo: A high performance numerical linear algebra library.

[BibT_eX]

[DOI]

J. Open Source Softw., 2020

An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action.

[BibT_eX]

[DOI]

F1000Research, 2020

Compressed Basis GMRES on High Performance GPUs.

[BibT_eX]

[DOI]

CoRR, 2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations.

[BibT_eX]

[DOI]

Yuhsiang Mike Tsai

CoRR, 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2020

An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action.

[BibT_eX]

[DOI]

CoRR, 2020

Evaluating Abstract Asynchronous Schwarz solvers.

[BibT_eX]

[DOI]

CoRR, 2020

A customized precision format based on mantissa segmentation for accelerating sparse linear algebra.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

Sparse Linear Algebra on AMD and NVIDIA GPUs - The Race Is On.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 35th International Conference, 2020

Two-stage Asynchronous Iterative Solvers for multi-GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Scalable Data Generation for Evaluating Mixed-Precision Solvers.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Preparing Ginkgo for AMD GPUs - A Testimonial on Porting CUDA Code to HIP.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

Multiprecision Block-Jacobi for Iterative Triangular Solves.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing, 2020

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

2019

Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors.

[BibT_eX]

[DOI]

Parallel Comput., 2019

Fine-grained bit-flip protection for relaxation methods.

[BibT_eX]

[DOI]

J. Comput. Sci., 2019

PAPI software-defined events for in-depth performance analysis.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Toward a modular precision ecosystem for high-performance computing.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2019

Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software.

[BibT_eX]

[DOI]

Weichung Wang

Proceedings of the Platform for Advanced Scientific Computing Conference, 2019

Approximate and Exact Selection on GPUs.

[BibT_eX]

[DOI]

Tobias Ribizel

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

ParILUT - A Parallel Threshold ILU for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Are we Doing the Right Thing? - A Critical Analysis of the Academic HPC Community.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2018

ParILUT - A New Parallel Threshold ILU Factorization.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2018

Incomplete Sparse Approximate Inverses for Parallel Preconditioning.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2018

Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems.

[BibT_eX]

[DOI]

Vedran Novakovic

Proceedings of the High Performance Computing, 2018

High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation.

[BibT_eX]

[DOI]

Florian Scheidegger

Proceedings of the 8th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2018

Variable-Size Batched Condition Number Calculation on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

A Modular Precision Format for Decoupling Arithmetic Format and Storage Format.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

2017

Preconditioned Krylov solvers on GPUs.

[BibT_eX]

[DOI]

Parallel Comput., 2017

On the performance and energy efficiency of sparse linear algebra on GPUs.

[BibT_eX]

[DOI]

Stanimire Tomov

Int. J. High Perform. Comput. Appl., 2017

With Extreme Computing, the Rules Have Changed.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2017

Overcoming Load Imbalance for Irregular Sparse Matrices.

[BibT_eX]

[DOI]

Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

Flexible batched sparse matrix-vector product on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2017

Bringing High Performance Computing to Big Data Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Big Data Technologies, 2017

2016

Domain Overlap for Iterative Sparse Triangular Solves on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Updating incomplete factorization preconditioners for model order reduction.

[BibT_eX]

[DOI]

Numer. Algorithms, 2016

Linear algebra software for large-scale accelerated multicore computing.

[BibT_eX]

[DOI]

Acta Numer., 2016

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

Batched Generation of Incomplete Sparse Approximate Inverses on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2016

Heterogeneous Streaming.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Efficiency of General Krylov Methods on GPUs - An Experimental Study.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Acceleration of GPU-based Krylov solvers via data transfer reduction.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2015

Experiences in autotuning matrix multiplication for energy minimization on GPUs.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors.

[BibT_eX]

[DOI]

Maribel Castillo

Germán León

Joaquín Pérez

Concurr. Comput. Pract. Exp., 2015

Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product.

[BibT_eX]

[DOI]

Stanimire Tomov

Proceedings of the Symposium on High Performance Computing, 2015

GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Tuning stationary iterative solvers for fault resilience.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Adaptive precision solvers for sparse linear systems.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, 2015

Energy efficiency and performance frontiers for sparse computations on GPU supercomputers.

[BibT_eX]

[DOI]

Stanimire Tomov

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Iterative Sparse Triangular Solves for Preconditioning.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Accelerating collaborative filtering using concepts from high performance computing.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014

A unified energy footprint for simulation software.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2014

Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Hybrid Multi-elimination ILU Preconditioners on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013

A block-asynchronous relaxation method for graphics processing units.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures.

[BibT_eX]

[DOI]

Maribel Castillo

Germán León

Joaquín Pérez

Proceedings of the Parallel Processing and Applied Mathematics, 2013

Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs.

[BibT_eX]

[DOI]

Joaquín Pérez

Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012

Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2012

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors.

[BibT_eX]

[DOI]

Maribel Castillo

Vincent Heuveline

Francisco D. Igual

Rafael Mayo

Andreas Helfrich-Schkarbanenko

Comput. Sci. Res. Dev., 2012

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

HiFlow<sup>3</sup>: A Hardware-Aware Parallel Finite Element Package.

[BibT_eX]

[DOI]

Sebastian Ritterbusch

Staffan Ronnas

Michael Schick

Mareike Schmidtobreick

Chandramowli Subramanian

Jan-Philipp Weiss

Florian Wilhelm

Martin Wlotzka

Proceedings of the Tools for High Performance Computing 2011, 2011

Power Consumption of Mixed Precision in the Iterative Solution of Sparse Linear Systems.

[BibT_eX]

[DOI]

Rafael Mayo

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms.

[BibT_eX]

[DOI]

Rafael Mayo

Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

2010

Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms - An evaluation of different solver and hardware configurations.

[BibT_eX]

[DOI]

Björn Rocker

Vincent Heuveline

Comput. Sci. Res. Dev., 2010

An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations.

[BibT_eX]

[DOI]