Jan Eitzinger

Rafael Ravedutti Lucio Machado

T. Gruber

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

2023

MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., December, 2023

MD-Bench: Engineering the in-core performance of short-range molecular dynamics kernels from state-of-the-art simulation packages.

[BibT_eX]

[DOI]

Rafael Ravedutti Lucio Machado

CoRR, 2023

2022

MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms.

[BibT_eX]

[DOI]

Rafael Ravedutti Lucio Machado

Harald Köstler

Rafael Ravedutti L. Machado

Proceedings of the Parallel Processing and Applied Mathematics, 2022

2021

An instrumentation framework for performance analysis of Halide schedules.

[BibT_eX]

[DOI]

Rafael Ravedutti L. Machado

André Murbach Maidl

Daniel Weingaertner

J. Comput. Lang., 2021

tinyMD: Mapping molecular dynamics simulations to heterogeneous hardware using partial evaluation.

[BibT_eX]

[DOI]

J. Comput. Sci., 2021

2020

tinyMD: A Portable and Scalable Implementation for Pairwise Interactions Simulations.

[BibT_eX]

[DOI]

Rafael Ravedutti L. Machado

CoRR, 2020

2019

ClusterCockpit - A web application for job-specific performance monitoring.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018

Unified Code Generation for the Parallel Computation of Pairwise Interactions Using Partial Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 17th International Symposium on Parallel and Distributed Computing, 2018

2017

Validation of hardware events for successful performance pattern identification in High Performance Computing.

[BibT_eX]

[DOI]

CoRR, 2017

Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels.

[BibT_eX]

[DOI]

CoRR, 2017

Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2017

LIKWID Monitoring Stack: A Flexible Framework Enabling Job Specific Performance monitoring for the masses.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

Performance analysis of the Kahan-enhanced scalar product on current multi- and manycore processors.

[BibT_eX]

[DOI]

CoRR, 2016

Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Exploring performance and power properties of modern multi-core chips via simple machine models.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2016

Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2016, 2016

2015

Performance analysis of the Kahan-enhanced scalar product on current multicore processors.

[BibT_eX]

[DOI]

CoRR, 2015

Execution-Cache-Memory Performance Model: Introduction and Validation.

[BibT_eX]

[DOI]

Johannes Hofmann

Dietmar Fey

CoRR, 2015

Automatic loop kernel analysis and performance modeling with Kerncraft.

[BibT_eX]

[DOI]

Proceedings of the 6th International Workshop on Performance Modeling, 2015

Performance Analysis of the Kahan-Enhanced Scalar Product on Current Multicore Processors.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

Quantifying Performance Bottlenecks of Stencil Computations Using the Execution-Cache-Memory Model.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014

Tools and methods for measuring and tuning the energy efficiency of HPC systems.

[BibT_eX]

[DOI]

Sci. Program., 2014

Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips.

[BibT_eX]

[DOI]

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, 2014

Overhead Analysis of Performance Counter Measurements.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator.

[BibT_eX]

[DOI]

Proceedings of the ARCS 2014, 2014

2013

Pushing the limits for medical image reconstruction on recent standard multicore processors.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Optimization of FASTEST-3D for Modern Multicore Systems

[BibT_eX]

[DOI]

CoRR, 2013

Optimizing IBM algorithmics' mark-to-future aggregation engine for real-time counterparty credit risk scoring.

[BibT_eX]

[DOI]

Proceedings of WHPCF'13: 6th Workshop on High Performance Computational Finance, 2013

Topic 11: Multicore and Manycore Programming - (Introduction).

[BibT_eX]

[DOI]

Luiz De Rose

Alba Cristina Magalhaes Alves de Melo

William Jalby

David Abramson

Alastair F. Donaldson

Tomàs Margalef

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012

Expression Templates Revisited: A Performance Analysis of Current Methodologies.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2012

Exploring performance and power properties of modern multicore chips via simple machine models

[BibT_eX]

[DOI]

CoRR, 2012

Best practices for HPM-assisted performance engineering on modern multicore processors

[BibT_eX]

[DOI]

CoRR, 2012

High performance smart expression template math libraries.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

2011

Efficient multicore-aware parallelization strategies for iterative stencil computations.

[BibT_eX]

[DOI]

J. Comput. Sci., 2011

Expression Templates Revisited: A Performance Analysis of the Current ET Methodology

[BibT_eX]

[DOI]

CoRR, 2011

Poster: LIKWID: lightweight performance tools.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

likwid-bench: An Extensible Microbenchmarking Platform for x86 Multicore Compute Nodes.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2011, 2011

2010

Leveraging Shared Caches for Parallel Temporal Blocking of Stencil Codes on Multicore Processors and Clusters.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2010

LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

LIKWID: Lightweight Performance Tools.

[BibT_eX]

[DOI]

Proceedings of the Competence in High Performance Computing 2010, 2010

2009

Multi-core architectures: Complexities of performance prediction and the impact of cache topology

[BibT_eX]

[DOI]

CoRR, 2009

Introducing a Performance Model for Bandwidth-Limited Loop Kernels.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2009

2008

Efficiency improvements of iterative numerical algorithms on modern architectures.

[BibT_eX]

[DOI]

PhD thesis, 2008

Optimising a 3D multigrid algorithm for the IA-64 architecture.

[BibT_eX]

[DOI]

Markus Stürmer

Ulrich Rüde

Int. J. Comput. Sci. Eng., 2008

2006

ORCAN: A platform for complex parallel simulation software.

[BibT_eX]

[DOI]