Hartwig Anzt

Orcid: 0000-0003-2177-952X

Affiliations:
  • University of Tennessee, Knoxville, TN, USA


According to our database1, Hartwig Anzt authored at least 127 papers between 2010 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
JuMonC: A RESTful tool for enabling monitoring and control of simulations at scale.
Future Gener. Comput. Syst., 2025

Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing.
Future Gener. Comput. Syst., 2025

2024
Ginkgo - A math library designed to accelerate Exascale Computing Project science applications.
Int. J. High Perform. Comput. Appl., 2024

Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications.
Future Gener. Comput. Syst., 2024

Then and Now: Improving Software Portability, Productivity, and 100× Performance.
Comput. Sci. Eng., 2024

FRSZ2 for In-Register Block Compression Inside GMRES on GPUs.
CoRR, 2024

A Probabilistic Model for Asynchronous Iterative Methods.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
Three-precision algebraic multigrid on GPUs.
Future Gener. Comput. Syst., December, 2023

Integrating batched sparse iterative solvers for the collision operator in fusion plasma simulations on GPUs.
J. Parallel Distributed Comput., August, 2023

Fast truncated SVD of sparse and dense matrices on graphics processors.
Int. J. High Perform. Comput. Appl., July, 2023

Compressed basis GMRES on high-performance graphics processing units.
Int. J. High Perform. Comput. Appl., March, 2023

Using Ginkgo's memory accessor for improving the accuracy of memory-bound low precision BLAS.
Softw. Pract. Exp., 2023

Earth Virtualization Engines: A Technical Perspective.
Comput. Sci. Eng., 2023

GPU-Resident Sparse Direct Linear Solvers for Alternating Current Optimal Power Flow Analysis.
CoRR, 2023

Providing performance portable numerics for Intel GPUs.
Concurr. Comput. Pract. Exp., 2023

Sparse matrix-vector and matrix-multivector products for the truncated SVD on graphics processors.
Concurr. Comput. Pract. Exp., 2023

A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs.
Proceedings of the High Performance Computing - 38th International Conference, 2023

Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Parallel Symbolic Cholesky Factorization.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Porting Batched Iterative Solvers onto Intel GPUs with SYCL.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

PAQR: Pivoting Avoiding QR factorization.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Utilizing batched solver ideas for efficient solution of non-batched linear systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

BDDC Preconditioning on GPUs for Cardiac Simulations.
Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

2022
Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing.
ACM Trans. Math. Softw., 2022

Ginkgo - A math library designed for platform portability.
Parallel Comput., 2022

Resiliency in numerical algorithm design for extreme scale simulations.
Int. J. High Perform. Comput. Appl., 2022

Compression and load balancing for efficient sparse matrix-vector product on multicore processors and graphics processing units.
Concurr. Comput. Pract. Exp., 2022

Preconditioners for Batched Iterative Linear Solvers on GPUs.
Proceedings of the Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation, 2022

Implementing Asynchronous Jacobi Iteration on GPUs.
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022

Prediction of Optimal Solvers for Sparse Linear Systems Using Deep Learning.
Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing, 2022

Mixed Precision Algebraic Multigrid on GPUs.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

Batched sparse iterative solvers on GPU for the collision operator for fusion plasma simulations.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022


2021
Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software.
ACM Trans. Math. Softw., 2021

Crediting pull requests to open source research software as an academic contribution.
J. Comput. Sci., 2021

Evaluating asynchronous Schwarz solvers on GPUs.
Int. J. High Perform. Comput. Appl., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.
Int. J. High Perform. Comput. Appl., 2021

Porting a sparse linear algebra math library to Intel GPUs.
CoRR, 2021

A Fresh Look at FAIR for Research Software.
CoRR, 2021

Batched Sparse Iterative Solvers for Computational Chemistry Simulations on GPUs.
Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021

A Collaborative Peer Review Process for Grading Coding Assignments.
Proceedings of the Computational Science - ICCS 2021, 2021

Porting Sparse Linear Algebra to Intel GPUs.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

Mixed Precision Incomplete and Factorized Sparse Approximate Inverse Preconditioning on GPUs.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020
The Research Software Alliance (ReSA) and the community landscape.
Dataset, March, 2020

Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation.
ACM Trans. Parallel Comput., 2020

Load-balancing Sparse Matrix Vector Product Kernels on GPUs.
ACM Trans. Parallel Comput., 2020

Parallel selection on GPUs.
Parallel Comput., 2020

Ginkgo: A high performance numerical linear algebra library.
J. Open Source Softw., 2020

An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action.
F1000Research, 2020

Compressed Basis GMRES on High Performance GPUs.
CoRR, 2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse Linear Algebra Computations.
CoRR, 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.
CoRR, 2020

An Environment for Sustainable Research Software in Germany and Beyond: Current State, Open Challenges, and Call for Action.
CoRR, 2020

Evaluating Abstract Asynchronous Schwarz solvers.
CoRR, 2020

A customized precision format based on mantissa segmentation for accelerating sparse linear algebra.
Concurr. Comput. Pract. Exp., 2020

Sparse Linear Algebra on AMD and NVIDIA GPUs - The Race Is On.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Two-stage Asynchronous Iterative Solvers for multi-GPU Clusters.
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020

Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Scalable Data Generation for Evaluating Mixed-Precision Solvers.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

Preparing Ginkgo for AMD GPUs - A Testimonial on Porting CUDA Code to HIP.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

Multiprecision Block-Jacobi for Iterative Triangular Solves.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

Balanced and Compressed Coordinate Layout for the Sparse Matrix-Vector Product on GPUs.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

2019
Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors.
Parallel Comput., 2019

Fine-grained bit-flip protection for relaxation methods.
J. Comput. Sci., 2019

PAPI software-defined events for in-depth performance analysis.
Int. J. High Perform. Comput. Appl., 2019

Toward a modular precision ecosystem for high-performance computing.
Int. J. High Perform. Comput. Appl., 2019

Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers.
Concurr. Comput. Pract. Exp., 2019

Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2019

Approximate and Exact Selection on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

ParILUT - A Parallel Threshold ILU for GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Are we Doing the Right Thing? - A Critical Analysis of the Academic HPC Community.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2018
ParILUT - A New Parallel Threshold ILU Factorization.
SIAM J. Sci. Comput., 2018

Incomplete Sparse Approximate Inverses for Parallel Preconditioning.
Parallel Comput., 2018

Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning.
J. Parallel Distributed Comput., 2018

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs.
Int. J. High Perform. Comput. Appl., 2018

Residual Replacement in Mixed-Precision Iterative Refinement for Sparse Linear Systems.
Proceedings of the High Performance Computing, 2018

High-Performance GPU Implementation of PageRank with Reduced Precision Based on Mantissa Segmentation.
Proceedings of the 8th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2018

Variable-Size Batched Condition Number Calculation on GPUs.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

A Jaccard Weights Kernel Leveraging Independent Thread Scheduling on GPUs.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

A Modular Precision Format for Decoupling Arithmetic Format and Storage Format.
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

2017
Preconditioned Krylov solvers on GPUs.
Parallel Comput., 2017

On the performance and energy efficiency of sparse linear algebra on GPUs.
Int. J. High Perform. Comput. Appl., 2017

With Extreme Computing, the Rules Have Changed.
Comput. Sci. Eng., 2017

Overcoming Load Imbalance for Irregular Sparse Matrices.
Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

Flexible batched sparse matrix-vector product on GPUs.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

Batched Gauss-Jordan Elimination for Block-Jacobi Preconditioner Generation on GPUs.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017

Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Variable-Size Batched Gauss-Huard for Block-Jacobi Preconditioning.
Proceedings of the International Conference on Computational Science, 2017

Bringing High Performance Computing to Big Data Algorithms.
Proceedings of the Handbook of Big Data Technologies, 2017

2016
Domain Overlap for Iterative Sparse Triangular Solves on GPUs.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs.
IEEE Trans. Parallel Distributed Syst., 2016

Updating incomplete factorization preconditioners for model order reduction.
Numer. Algorithms, 2016

Linear algebra software for large-scale accelerated multicore computing.
Acta Numer., 2016

Accelerating the Conjugate Gradient Algorithm with GPUs in CFD Simulations.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

Batched Generation of Incomplete Sparse Approximate Inverses on GPUs.
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2016

Heterogeneous Streaming.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Efficiency of General Krylov Methods on GPUs - An Experimental Study.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Acceleration of GPU-based Krylov solvers via data transfer reduction.
Int. J. High Perform. Comput. Appl., 2015

Experiences in autotuning matrix multiplication for energy minimization on GPUs.
Concurr. Comput. Pract. Exp., 2015

Unveiling the performance-energy trade-off in iterative linear system solvers for multithreaded processors.
Concurr. Comput. Pract. Exp., 2015

Asynchronous Iterative Algorithm for Computing Incomplete Factorizations on GPUs.
Proceedings of the High Performance Computing - 30th International Conference, 2015

Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product.
Proceedings of the Symposium on High Performance Computing, 2015

GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap.
Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Tuning stationary iterative solvers for fault resilience.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Adaptive precision solvers for sparse linear systems.
Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, 2015

Energy efficiency and performance frontiers for sparse computations on GPU supercomputers.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Iterative Sparse Triangular Solves for Preconditioning.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Accelerating collaborative filtering using concepts from high performance computing.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014
A unified energy footprint for simulation software.
Comput. Sci. Res. Dev., 2014

Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Hybrid Multi-elimination ILU Preconditioners on GPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013
A block-asynchronous relaxation method for graphics processing units.
J. Parallel Distributed Comput., 2013

Performance and Energy Analysis of the Iterative Solution of Sparse Linear Systems on Multicore and Manycore Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Reformulated Conjugate Gradient for the Energy-Aware Solution of Linear Systems on GPUs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012
Block-asynchronous Multigrid Smoothers for GPU-accelerated Systems.
Proceedings of the International Conference on Computational Science, 2012

Optimization of power consumption in the iterative solution of sparse linear systems on graphics processors.
Comput. Sci. Res. Dev., 2012

Weighted Block-Asynchronous Iteration on GPU-Accelerated Systems.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

Power Consumption of Mixed Precision in the Iterative Solution of Sparse Linear Systems.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analysis and optimization of power consumption in the iterative solution of sparse linear systems on multi-core and many-core platforms.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

2010
Energy efficiency of mixed precision iterative refinement methods using hybrid hardware platforms - An evaluation of different solver and hardware configurations.
Comput. Sci. Res. Dev., 2010

An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

Mixed Precision Iterative Refinement Methods for Linear Systems: Convergence Analysis Based on Krylov Subspace Methods.
Proceedings of the Applied Parallel and Scientific Computing, 2010


  Loading...