Matthias Diener

Orcid: 0000-0002-9064-7806

According to our database1, Matthias Diener authored at least 78 papers between 2010 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of two.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Mitigating execution unit contention in parallel applications using instruction-aware mapping.
Concurr. Comput. Pract. Exp., 2023

2022
Thread and Data Mapping in Software Transactional Memory: An Overview.
CoRR, 2022

2021



Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit.
ACM Trans. Model. Perform. Evaluation Comput. Syst., 2021

Performance Evaluation of Python Parallel Programming Models: Charm4Py and mpi4py.
CoRR, 2021

Sharing-Aware Data Mapping in Software Transactional Memory.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

Performance Evaluation of Python Parallel Programming Models: and mpi4py.
Proceedings of the 6th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2021

Current Requirements and Implementations in the field of Web Tracking in the pre-age of E-Privacy Regulation.
Proceedings of the 29th European Conference on Information Systems, 2021

2020




Heterogeneous computing with OpenMP and Hydra.
Concurr. Comput. Pract. Exp., 2020

Online Sharing-Aware Thread Mapping in Software Transactional Memory.
Proceedings of the 32nd IEEE International Symposium on Computer Architecture and High Performance Computing, 2020

Thread Affinity in Software Transactional Memory.
Proceedings of the 19th International Symposium on Parallel and Distributed Computing, 2020

Unified data movement for offloading Charm++ applications.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Characterizing the Sharing Behavior of Applications Using Software Transactional Memory.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2019



EagerMap: A Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems.
ACM Trans. Parallel Comput., 2019

Optimization strategies for geophysics models on manycore systems.
Int. J. High Perform. Comput. Appl., 2019

Managing Power Demand and Load Imbalance to Save Energy on Systems with Heterogeneous CPU Speeds.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Memory Performance and Bottlenecks in Multicore and GPU Architectures.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Multi-phased Task Placement of HPC Applications in the Cloud.
Proceedings of the 18th International Symposium on Parallel and Distributed Computing, 2019

Exploring Instance Heterogeneity in Public Cloud Providers for HPC Applications.
Proceedings of the 9th International Conference on Cloud Computing and Services Science, 2019

2018
Thread and Data Mapping for Multicore Systems - Improving Communication and Memory Accesses
Springer Briefs in Computer Science, Springer, ISBN: 978-3-319-91073-4, 2018

Accelerating Scientific Applications on Heterogeneous Systems with HybridOMP.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Improving Communication and Load Balancing with Thread Mapping in Manycore Systems.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Exploiting Load Imbalance Patterns for Heterogeneous Cloud Computing Platforms.
Proceedings of the 8th International Conference on Cloud Computing and Services Science, 2018

Multi-Level Load Balancing with an Integrated Runtime Approach.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Modeling memory access behavior for data mapping.
Int. J. High Perform. Comput. Appl., 2017

Affinity-Based Thread and Data Mapping in Shared Memory Systems.
ACM Comput. Surv., 2017

Exploiting Price and Performance Tradeoffs in Heterogeneous Clouds.
Proceedings of the Companion Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

Visualizing, Measuring, and Tuning Adaptive MPI Parameters.
Proceedings of the Programming and Performance Visualization Tools, 2017

Integrating OpenMP into the Charm++ Programming Model.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Improving the memory access locality of hybrid MPI applications.
Proceedings of the 24th European MPI Users' Group Meeting, 2017

HPC Application Performance and Cost Efficiency in the Cloud.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Data mining the memory access stream to detect anomalous application behavior.
Proceedings of the Computing Frontiers Conference, 2017

Optimizing memory affinity with a hybrid compiler/OS approach.
Proceedings of the Computing Frontiers Conference, 2017

2016
Kernel-Based Thread and Data Mapping for Improved Memory Affinity.
IEEE Trans. Parallel Distributed Syst., 2016

Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures.
ACM Trans. Archit. Code Optim., 2016

A dynamic block-level execution profiler.
Parallel Comput., 2016

LAPT: A locality-aware page table for thread and data mapping.
Parallel Comput., 2016

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Communication in Shared Memory: Concepts, Definitions, and Efficient Detection.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Large vector extensions inside the HMC.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Automatic Communication Optimization of Parallel Applications in Public Clouds.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

2015
Automatic task and data mapping in shared memory architectures.
PhD thesis, 2015

Characterizing communication and page usage of parallel applications for thread and data mapping.
Perform. Evaluation, 2015

Communication-aware process and thread mapping using online communication detection.
Parallel Comput., 2015

Communication-aware thread mapping using the translation lookaside buffer.
Concurr. Comput. Pract. Exp., 2015

TABARNAC: visualizing and resolving memory access issues on NUMA architectures.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Partial coscheduling of virtual machines based on memory access patterns.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Reconfigurable Vector Extensions inside the DRAM.
Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip, 2015

Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

An Efficient Algorithm for Communication-Based Task Mapping.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Opportunities and Challenges of Performing Vector Operations inside the DRAM.
Proceedings of the 2015 International Symposium on Memory Systems, 2015

SiNUCA: A Validated Micro-Architecture Simulator.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Saving memory movements through vector processing in the DRAM.
Proceedings of the 2015 International Conference on Compilers, 2015

2014
Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols.
J. Parallel Distributed Comput., 2014

Optimizing Memory Locality Using a Locality-Aware Page Table.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

kMAF: automatic kernel-level management of thread and data affinity.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Energy Efficient Last Level Caches via Last Read/Write Prediction.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Analyzing resource interdependencies in multi-core architectures to improve scheduling decisions.
Proceedings of the 28th Annual ACM Symposium on Applied Computing, 2013

Communication-Based Mapping Using Shared Pages.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

High Performance Computing in the cloud: Deployment, performance and cost efficiency.
Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012

Evaluating High Performance Computing on the Windows Azure Platform.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

2010
Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010


  Loading...