Georg Hager
Orcid: 0000-0002-8723-2781Affiliations:
- Erlangen National High Performance Computing Center, Germany
According to our database1,
Georg Hager
authored at least 134 papers
between 2002 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Analytic Roofline Modeling and Energy Analysis of LULESH Proxy Application on Multi-Core Clusters.
CoRR, 2024
Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms.
Future Gener. Comput. Syst., December, 2023
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives.
Future Gener. Comput. Syst., November, 2023
ACM Trans. Parallel Comput., September, 2023
J. Parallel Distributed Comput., March, 2023
IEEE Trans. Parallel Distributed Syst., February, 2023
The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs.
IEEE Trans. Parallel Distributed Syst., February, 2023
CoRR, 2023
MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages.
CoRR, 2023
Core-Level Performance Engineering with the Open-Source Architecture Code Analyzer (OSACA) and the Compiler Explorer.
Proceedings of the Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023
Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
2022
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX.
Concurr. Comput. Pract. Exp., 2022
Concurr. Comput. Pract. Exp., 2022
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications.
Proceedings of the Parallel Processing and Applied Mathematics, 2022
Proceedings of the SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation, Atlanta, GA, USA, June 8, 2022
2021
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.
Int. J. High Perform. Comput. Appl., 2021
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs.
Int. J. High Perform. Comput. Appl., 2021
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact.
Proceedings of the High Performance Computing - 36th International Conference, 2021
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021
2020
Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020
A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication.
ACM Trans. Parallel Comput., 2020
ACM Trans. Math. Softw., 2020
Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server Processors.
Supercomput. Front. Innov., 2020
Int. J. High Perform. Comput. Appl., 2020
An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs.
CoRR, 2020
Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors.
Proceedings of the High Performance Computing - 35th International Conference, 2020
Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs.
Proceedings of the High Performance Computing - 35th International Conference, 2020
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020
2019
CRAFT: A Library for Easier Application-Level Checkpoint/Restart and Automatic Fault Tolerance.
IEEE Trans. Parallel Distributed Syst., 2019
Supercomput. Front. Innov., 2019
Delay Propagation and Overlapping Mechanisms on Clusters: A Case Study of Idle Periods based on Workload, Communication, and Delay Granularity.
CoRR, 2019
CoRR, 2019
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019
Proceedings of the Parallel Processing and Applied Mathematics, 2019
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
2018
ACM Trans. Parallel Comput., 2018
Int. J. High Perform. Comput. Appl., 2018
CoRR, 2018
Proceedings of the High Performance Computing - 33rd International Conference, 2018
On the Accuracy and Usefulness of Analytic Energy Models for Contemporary Multicore Processors.
Proceedings of the High Performance Computing - 33rd International Conference, 2018
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018
2017
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems.
Int. J. Parallel Program., 2017
Validation of hardware events for successful performance pattern identification in High Performance Computing.
CoRR, 2017
PVSC-DTM: A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials.
CoRR, 2017
Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors.
Concurr. Comput. Pract. Exp., 2017
An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors.
Proceedings of the High Performance Computing - 32nd International Conference, 2017
LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
2016
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016
Performance Engineering and Energy Efficiency of Building Blocks for Large, Sparse Eigenvalue Computations on Heterogeneous Supercomputers.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations.
J. Comput. Phys., 2016
Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors.
CoRR, 2016
Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations.
Concurr. Comput. Pract. Exp., 2016
Exploring performance and power properties of modern multi-core chips via simple machine models.
Concurr. Comput. Pract. Exp., 2016
Concurr. Comput. Pract. Exp., 2016
Optimization of an Electromagnetics Code with Multicore Wavefront Diamond Blocking and Multi-dimensional Intra-Tile Parallelization.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks.
Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016
2015
SIAM J. Sci. Comput., 2015
SIAM J. Sci. Comput., 2015
Short Note on Costs of Floating Point Operations on current x86-64 Architectures: Denormals, Overflow, Underflow, and Division by Zero.
CoRR, 2015
Multi-dimensional intra-tile parallelization for memory-starved stencil computations.
CoRR, 2015
Performance analysis of the Kahan-enhanced scalar product on current multicore processors.
CoRR, 2015
Proceedings of the 6th International Workshop on Performance Modeling, 2015
Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
2014
A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units.
SIAM J. Sci. Comput., 2014
Domain-Specific Optimization of Two Jacobi Smoother Kernels and Their Evaluation in the ECM Performance Model.
Parallel Process. Lett., 2014
Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices.
CoRR, 2014
Towards energy efficiency and maximum computational intensity for stencil algorithms using wavefront diamond temporal blocking.
CoRR, 2014
Performance Engineering of the Kernel Polynomial Method on Large-Scale CPU-GPU Systems.
CoRR, 2014
Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips.
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.
Proceedings of the ARCS 2014, 2014
2013
Parallel Process. Lett., 2013
Pushing the limits for medical image reconstruction on recent standard multicore processors.
Int. J. High Perform. Comput. Appl., 2013
An analysis of energy-optimized lattice-Boltzmann CFD simulations from the chip to the highly parallel level
CoRR, 2013
CoRR, 2013
Comput. Math. Appl., 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the International Conference on High Performance Computing & Simulation, 2013
2012
SIAM J. Sci. Comput., 2012
Exploring performance and power properties of modern multicore chips via simple machine models
CoRR, 2012
Best practices for HPM-assisted performance engineering on modern multicore processors
CoRR, 2012
Sparse Matrix-vector Multiplication on GPGPU Clusters: A New Storage Format and a Scalable Implementation.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012
Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
2011
Hybrid-Parallel Sparse Matrix-Vector Multiplication with Explicit Communication Overlap on Current Multicore-Based Systems.
Parallel Process. Lett., 2011
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters.
Parallel Comput., 2011
Efficient multicore-aware parallelization strategies for iterative stencil computations.
J. Comput. Sci., 2011
Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
CoRR, 2011
Domain decomposition and locality optimization for large-scale lattice Boltzmann simulations
CoRR, 2011
CoRR, 2011
CoRR, 2011
Optimizing ccNUMA locality for task-parallel execution under OpenMP and TBB on multicore-based systems
CoRR, 2011
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA.
Adv. Eng. Softw., 2011
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011
likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes.
Proceedings of the Tools for High Performance Computing 2011, 2011
Parallel Sparse Matrix-Vector Multiplication as a Test Case for Hybrid MPI+OpenMP Programming.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-439-81192-4, 2011
2010
Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.
Parallel Process. Lett., 2010
Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.
Proceedings of the 39th International Conference on Parallel Processing, 2010
Proceedings of the Competence in High Performance Computing 2010, 2010
2009
Benchmark Analysis and Application Results for Lattice Boltzmann Simulations on NEC SX Vector and Intel Nehalem Systems.
Parallel Process. Lett., 2009
Multi-core architectures: Complexities of performance prediction and the impact of cache topology
CoRR, 2009
Performance limitations for sparse matrix-vector multiplications on current multicore environments
CoRR, 2009
Proceedings of the Parallel Processing and Applied Mathematics, 2009
Proceedings of the 17th Euromicro International Conference on Parallel, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization.
Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference, 2009
2008
Parallel Process. Lett., 2008
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver.
Proceedings of the High Performance Computing in Science and Engineering '08, 2008
2007
RZBENCH: Performance evaluation of current HPC architectures using low-level and application benchmarks
CoRR, 2007
2006
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006
2003
Exact Numerical Treatment of Finite Quantum Systems Using Leading-Edge Supercomputers.
Proceedings of the Modeling, 2003
2002
Proceedings of the High Performance Computing for Computational Science, 2002