Felix Wolf

Orcid: 0000-0001-6595-3599

Affiliations:
  • TU Darmstadt, Laboratory for Parallel Programming, Darmstadt, Germany
  • RWTH Aachen University, Germany (PhD 2003)


According to our database1, Felix Wolf authored at least 199 papers between 1999 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities.
IEEE Trans. Parallel Distributed Syst., September, 2024

Fast data-dependence profiling through prior static analysis.
Parallel Comput., 2024

Corrigendum: Building a realistic, scalable memory model with independent engrams using a homeostatic mechanism.
Frontiers Neuroinformatics, 2024

Building a realistic, scalable memory model with independent engrams using a homeostatic mechanism.
Frontiers Neuroinformatics, 2024

Capturing Periodic I/O Using Frequency Techniques.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Dissecting Convolutional Neural Networks for Runtime and Scalability Prediction.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
Extra-P: Automated performance modeling for HPC applications.
Dataset, November, 2023

Performance Measurement Dataset of the HPC Benchmarks FASTEST, Kripke, LULESH, MiniFE, Quicksilver, and RELeARN for Scalability Studies with Extra-P.
Dataset, November, 2023

Simulating structural plasticity of the brain more scalable than expected.
J. Parallel Distributed Comput., January, 2023

FTIO: Detecting I/O Periodicity Using Frequency Techniques.
CoRR, 2023

Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling.
Proceedings of the High Performance Computing, 2023

Extra-Deep: Automated Empirical Performance Modeling for Distributed Deep Learning.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Filtering and Ranking of Code Regions for Parallelization via Hotspot Detection and OpenMP Overhead Analysis.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Satellite Collision Detection using Spatial Data Structures.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Adaptive multi-tier intelligent data manager for Exascale.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022
Keeping up with technology: Teaching parallel, distributed, and high-performance computing.
J. Parallel Distributed Comput., 2022

Conquering Noise With Hardware Counters on HPC Systems.
Proceedings of the IEEE/ACM Workshop on Programming and Performance Visualization Tools, 2022

ElastiSim: A Batch-System Simulator for Malleable Workloads.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Accelerating Brain Simulations with the Fast Multipole Method.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

Multi-objective Hybrid Autoscaling of Microservices in Kubernetes Clusters.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

2021
Design-time performance modeling of compositional parallel programs.
Parallel Comput., 2021

Extracting clean performance models from tainted programs.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Learning to make compiler optimizations more effective.
Proceedings of the MAPS@PLDI 2021: Proceedings of the 5th ACM SIGPLAN International Symposium on Machine Programming, 2021

Noise-Resilient Empirical Performance Modeling with Deep Neural Networks.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Tool-Supported Mini-App Extraction to Facilitate Program Analysis and Parallelization.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020
ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications.
Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Static Neural Compiler Optimization via Deep Reinforcement Learning.
CoRR, 2020

Dynamic Multi-objective Scheduling of Microservices in the Cloud.
Proceedings of the 13th IEEE/ACM International Conference on Utility and Cloud Computing, 2020

Empirical Modeling of Spatially Diverging Performance.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

Safer Parallelization.
Proceedings of the Leveraging Applications of Formal Methods, Verification and Validation: Engineering Principles, 2020

Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Accelerating winograd convolutions using symbolic computation and meta-programming.
Proceedings of the EuroSys '20: Fifteenth EuroSys Conference 2020, 2020

Efficient Ephemeris Models for Spacecraft Trajectory Simulations on GPUs.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

Skipping Non-essential Instructions Makes Data-Dependence Profiling Faster.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019
Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations.
IEEE Trans. Parallel Distributed Syst., 2019

The Art of Getting Deep Neural Networks in Shape.
ACM Trans. Archit. Code Optim., 2019

How File-access Patterns Influence the Degree of I/O Interference between Cluster Applications.
Supercomput. Front. Innov., 2019

Dissecting sequential programs for parallelization - An approach based on computational units.
Concurr. Comput. Pract. Exp., 2019

Automatic Instrumentation Refinement for Empirical Performance Modeling.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Designing Efficient Parallel Software via Compositional Performance Modeling.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Automatic construct selection and variable classification in OpenMP.
Proceedings of the ACM International Conference on Supercomputing, 2019

Enhancing the Programmability and Performance Portability of GPU Tensor Operations.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

Accelerating Data-Dependence Profiling with Static Hints.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

Efficient Job Scheduling for Clusters with Shared Tiered Storage.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

A Container-Driven Approach for Resource Provisioning in Edge-Fog Cloud.
Proceedings of the Algorithmic Aspects of Cloud Computing - 5th International Symposium, 2019

2018
Scalasca analysis report of the ASCI Sweep3D benchmark on 294,912 processes in virtual-node mode on IBM Blue Gene/P with manually annotated iterations.
Dataset, August, 2018

Scalasca analysis report of the ASCI Sweep3D benchmark on 65,536 processes in virtual-node mode on IBM Blue Gene/P.
Dataset, April, 2018

Scalasca analysis report of the ASCI Sweep3D benchmark on 294,912 processes in virtual-node mode on IBM Blue Gene/P.
Dataset, April, 2018

Scalasca analysis report for SPEC MPI.2007 benchmark 132.zeump2 on 512 processes in virtual-node mode on Blue Gene/P.
Dataset, April, 2018

A scalable algorithm for simulating the structural plasticity of the brain.
J. Parallel Distributed Comput., 2018

Understanding the Scalability of Molecular Simulation Using Empirical Performance Modeling.
Proceedings of the Programming and Performance Visualization Tools, 2018

Using Deep Learning for Automated Communication Pattern Characterization: Little Steps and Big Challenges.
Proceedings of the Programming and Performance Visualization Tools, 2018

Unveiling Thread Communication Bottlenecks Using Hardware-Independent Metrics.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Estimating the Impact of External Interference on Application Performance.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Exploring the Performance Envelope of the LLL Algorithm.
Proceedings of the 2018 IEEE International Conference on Computational Science and Engineering, 2018

Lightweight Requirements Engineering for Exascale Co-design.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Efficient Fault Tolerance Through Dynamic Node Replacement.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Editorial of special issue on Software Engineering for Parallel Systems.
J. Syst. Softw., 2017

Brief Announcement: Meeting the Challenges of Parallelizing Sequential Programs.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Parallelizing Audio Analysis Applications - A Case Study.
Proceedings of the 39th IEEE/ACM International Conference on Software Engineering: Software Engineering Education and Training Track, 2017

Following the Blind Seer - Creating Better Performance Models Using Less Information.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Off-Road Performance Modeling - How to Deal with Segmented Data.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
Automatic Performance Modeling of HPC Applications.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Automated Performance Modeling of the UG4 Simulation Framework.
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications.
ACM Trans. Parallel Comput., 2016

Unveiling parallelization opportunities in sequential programs.
J. Syst. Softw., 2016

Automatic Generation of Unit Tests for Correlated Variables in Parallel Programs.
Int. J. Parallel Program., 2016

Automatic Parallel Pattern Detection in the Algorithm Structure Design Space.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Fast Multi-parameter Performance Modeling.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015
Separating the wheat from the chaff: identifying relevant and similar performance data with visual analytics.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Preventing the explosion of exascale profile data with smart thread-level aggregation.
Proceedings of the 4th Workshop on Extreme Scale Programming Tools, 2015

A Batch System with Efficient Adaptive Scheduling for Malleable and Evolving Applications.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

An Efficient Data-Dependence Profiler for Sequential and Parallel Programs.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Exascaling Your Library: Will Your Implementation Meet Your Expectations?
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Characterizing Loop-Level Communication Patterns in Shared Memory.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Beyond Data Parallelism: Identifying Parallel Tasks in Sequential Programs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

Fast Data-Dependence Profiling by Skipping Repeatedly Executed Memory Operations.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015

10, 000 Performance Models per Minute - Scalability of the UG4 Simulation Framework.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

How Many Threads will be too Many? On the Scalability of OpenMP Implementations.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Dependence-Based Code Transformation for Coarse-Grained Parallelism.
Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, 2015

The Basic Building Blocks of Parallel Tasks.
Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, 2015

2014
Using Template Matching to Infer Parallel Design Patterns.
ACM Trans. Archit. Code Optim., 2014

Special issue: Euro-Par 2013.
Concurr. Comput. Pract. Exp., 2014

Generating Classified Parallel Unit Tests.
Proceedings of the Tests and Proofs - 8th International Conference, 2014

Down to earth: how to visualize traffic on high-dimensional torus networks.
Proceedings of the First Workshop on Visual Performance Analysis, 2014

Catching Idlers with Ease: A Lightweight Wait-State Profiler for MPI Programs.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

SEPS 2014: first international workshop on software engineering for parallel systems.
Proceedings of the SPLASH'14, 2014

A Comparison between OPARI2 and the OpenMP Tools Interface in the Context of Score-P.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

A Batch System with Fair Scheduling for Evolving Applications.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Catwalk: A Quick Development Path for Performance Models.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

How file access patterns influence interference among cluster applications.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
A scalable infrastructure for the performance analysis of passive target synchronization.
Parallel Comput., 2013

Parallel universal access layer: A scalable I/O library for integrated tokamak modeling.
Comput. Phys. Commun., 2013

Extending the scope of the controlled logical clock.
Clust. Comput., 2013

Using automated performance modeling to find scalability bugs in complex codes.
Proceedings of the International Conference for High Performance Computing, 2013

Understanding the formation of wait states in applications with one-sided communication.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Massively parallel loading.
Proceedings of the International Conference on Supercomputing, 2013

Efficient Offloading of Parallel Kernels Using MPI_Comm_Spawn.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

A Dynamic Resource Management System for Network-Attached Accelerator Clusters.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Discovery of Potential Parallelism in Sequential Programs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Detecting Correlation Violations and Data Races by Inferring Non-deterministic Reads.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

Predicting Parallelization of Sequential Programs Using Supervised Learning.
Proceedings of the 12th International Conference on Machine Learning and Applications, 2013

Capturing inter-application interference on clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012
Scalable detection of MPI-2 remote memory access inefficiency patterns.
Int. J. High Perform. Comput. Appl., 2012

The HOPSA Workflow and Tools.
Proceedings of the Tools for High Performance Computing 2012, 2012

Generic Support for Remote Memory Access Operations in Score-P and OTF2.
Proceedings of the Tools for High Performance Computing 2012, 2012

Performance Analysis Techniques for Task-Based OpenMP Applications.
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012

Dynamic Load Balancing for Unstructured Meshes on Space-Filling Curves.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Scalable Critical-Path Based Performance Analysis.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A Dynamic Accelerator-Cluster Architecture.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Profiling of OpenMP Tasks with Score-P.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Pattern-Independent Detection of Manual Collectives in MPI Programs.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Scalasca.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Parallel Sorting with Minimal Data.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Scaling Performance Tool MPI Communicator Management.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir.
Proceedings of the Tools for High Performance Computing 2011, 2011

Patterns of Inefficient Performance Behavior in GPU Applications.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Performance Analysis of Long-Running Applications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Reconciling Sampling and Direct Instrumentation for Unintrusive Call-Path Profiling of MPI Programs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Score-P.
Proceedings of the Entwicklung und Evolution von Forschungssoftware: Tagungsband des Workshops, 2011

Scalasca.
Proceedings of the Entwicklung und Evolution von Forschungssoftware: Tagungsband des Workshops, 2011

2010
Large-Scale Performance Analysis of Sweep3D with the Scalasca Toolset.
Parallel Process. Lett., 2010

Performance measurement and analysis tools for extremely scalable systems.
Concurr. Comput. Pract. Exp., 2010

The Scalasca performance toolset architecture.
Concurr. Comput. Pract. Exp., 2010

Further Improving the Scalability of the Scalasca Toolset.
Proceedings of the Applied Parallel and Scientific Computing, 2010

How to Reconcile Event-Based Performance Analysis with Tasking in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Proceedings of the 15<sup>th</sup> international workshop on high-level parallel programming models and supportive environments.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Improvements of common open Grid standards to increase High Throughput and High Performance Computing effectiveness on large-scale Grid and e-science infrastructures.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Identifying the Root Causes of Wait States in Large-Scale Parallel Applications.
Proceedings of the 39th International Conference on Parallel Processing, 2010

PROPER 2010: Third Workshop on Productivity and Performance - Tools for HPC Application Development.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

Synchronizing the Timestamps of Concurrent Events in Traces of Hybrid MPI/OpenMP Applications.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Score-P: A Unified Performance Measurement System for Petascale Applications.
Proceedings of the Competence in High Performance Computing 2010, 2010

Exploring the Potential of Using Multiple E-science Infrastructures with Emerging Open Standards-Based E-health Research Tools.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

Experiences and Requirements for Interoperability Between HTC and HPC-driven e-Science Infrastructure.
Proceedings of the Future Application and Middleware Technology on e-Science, 2010

2009
Replay-based synchronization of timestamps in event traces of massively parallel applications.
Scalable Comput. Pract. Exp., 2009

A scalable tool architecture for diagnosing wait states in massively parallel applications.
Parallel Comput., 2009

Scalable timestamp synchronization for event traces of message-passing applications.
Parallel Comput., 2009

Interoperation of world-wide production e-Science infrastructures.
Concurr. Comput. Pract. Exp., 2009

Research advances by using interoperable e-science infrastructures.
Clust. Comput., 2009

Space-efficient time-series call-path profiling of parallel applications.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Scalable massively parallel I/O to task-local files.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

Recent Developments in the Scalasca Toolset.
Proceedings of the Tools for High Performance Computing 2009, 2009

Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

A Generic and Configurable Source-Code Instrumentation Component.
Proceedings of the Computational Science, 2009

Introduction.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

PROPER 2009: Workshop on Productivity and Performance - Tools for HPC Application Development.
Proceedings of the Euro-Par 2009, 2009

Performance Simulation of Non-blocking Communication in Message-Passing Applications.
Proceedings of the Euro-Par 2009, 2009

Enabling Grid Interoperability by Extending HPC-driven Job Management with an Open Standard Information Model.
Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science, 2009

2008
Performance measurement and analysis of large-scale parallel applications on leadership computing systems.
Sci. Program., 2008

SCALASCA Parallel Performance Analyses of SPEC MPI2007 Applications.
Proceedings of the Performance Evaluation: Metrics, 2008

Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications.
Proceedings of the Tools for High Performance Computing, 2008

Performance Evaluation and Optimization of Parallel Grid Computing Applications.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Extending the collaborative online visualization and steering framework for computational Grids with attribute-based authorization.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Scalasca Parallel Performance Analyses of PEPC.
Proceedings of the Euro-Par 2008 Workshops, 2008

Classification of Different Approaches for e-Science Applications in Next Generation Computing Infrastructures.
Proceedings of the Fourth International Conference on e-Science, 2008

Grid-Based Workflow Management.
Proceedings of the Grid and Services Evolution, 2008

Implications of non-constant clock drifts for the timestamps of concurrent events.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Compensation of Measurement Overhead in Parallel Performance Profiling.
Int. J. High Perform. Comput. Appl., 2007

Automatic analysis of inefficiency patterns in parallel applications.
Concurr. Comput. Pract. Exp., 2007

Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Scalability and Usability of HPC Programming Tools.
Proceedings of the Parallel Computing: Architectures, 2007

Scalable Collation and Presentation of Call-Path Profile Data with CUBE.
Proceedings of the Parallel Computing: Architectures, 2007

Automatic Trace-Based Performance Analysis of Metacomputing Applications.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Design and evaluation of a collaborative online visualization and steering framework implementation for computational grids.
Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), 2007

Computational Steering and Online Visualization of Scientific Applications on Large-Scale HPC Systems within e-Science Infrastructures.
Proceedings of the Third International Conference on e-Science and Grid Computing, 2007

2006
Performance Tools for Parallel Programming.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Scalable Parallel Trace-Based Performance Analysis.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Integrated Runtime Measurement Summarisation and Selective Event Tracing for Scalable Parallel Execution Performance Diagnosis.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Tools for Parallel Performance Analysis: Minisymposium Abstract.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

A Parallel Trace-Data Interface for Scalable Performance Analysis.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

A systematic multi-step methodology for performance analysis of communication traces of distributed applications based on hierarchical clustering.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Specification of Inefficiency Patterns for MPI-2 One-Sided Communication.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Large Event Traces in Parallel Performance Analysis.
Proceedings of the ARCS 2006, 2006

2005
Performance Profiling Overhead Compensation for MPI Programs.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

A Scalable Approach to MPI Application Performance Analysis.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Holistic Hardware Counter Performance Analysis of Parallel Programs.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Performance Analysis of One-sided Communication Mechanisms.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Automatic Experimental Analysis of Communication Patterns in Virtual Topologies.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Trace-Based Parallel Performance Overhead Compensation.
Proceedings of the High Performance Computing and Communications, 2005

Event-Based Measurement and Analysis of One-Sided Communication.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
An Algebra for Cross-Experiment Performance Analysis.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Efficient Pattern Search in Large Traces Through Successive Refinement.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Automatic performance analysis on parallel computers with SMP nodes.
PhD thesis, 2003

Automatic performance analysis of hybrid MPI/OpenMP applications.
J. Syst. Archit., 2003

Hardware-Counter Based Automatic Performance Analysis of Parallel Programs.
Proceedings of the Parallel Computing: Software Technology, 2003

KOJAK - A Tool Set for Automatic Performance Analysis of Parallel Programs.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

2002
Design and Prototype of a Performance Tool Interface for OpenMP.
J. Supercomput., 2002

CATCH - A Call-Graph Based Automatic Tool for Capture of Hardware Performance Metrics for MPI and OpenMP Applications.
Proceedings of the Euro-Par 2002, 2002

2001
Specifying Performance Properties of Parallel Applications Using Compound Events.
Parallel Distributed Comput. Pract., 2001

2000
Automatic Performance Analysis of MPI Applications Based on Event Traces.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
Performance analysis on CRAY T3E.
Proceedings of the Seventh Euromicro Workshop on Parallel and Distributed Processing. PDP'99, 1999

EARL - A Programmable and Extensible Toolkit for Analyzing Event Traces of Message Passing Programs.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999


  Loading...