Matthias Diener

Israel Koren

Proceedings of the Computing Frontiers Conference, 2017

Optimizing memory affinity with a hybrid compiler/OS approach.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2017

2016

Kernel-Based Thread and Data Mapping for Improved Memory Affinity.

[BibT_eX]

[DOI]

Hans-Ulrich Heiss

IEEE Trans. Parallel Distributed Syst., 2016

Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2016

A dynamic block-level execution profiler.

[BibT_eX]

[DOI]

Francis B. Moreira

Israel Koren

Parallel Comput., 2016

LAPT: A locality-aware page table for thread and data mapping.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency.

[BibT_eX]

[DOI]

Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines.

[BibT_eX]

[DOI]

Artur Mariano

Christian H. Bischof

Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Communication in Shared Memory: Concepts, Definitions, and Efficient Detection.

[BibT_eX]

[DOI]

Proceedings of the 24th Euromicro International Conference on Parallel, 2016

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Large vector extensions inside the HMC.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Automatic Communication Optimization of Parallel Applications in Public Clouds.

[BibT_eX]

[DOI]

Emmanuell D. Carreño

Jimmy K. M. Valverde-Sánchez

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC.

[BibT_eX]

[DOI]

Eduardo Roloff

Emmanuell Diaz Carreño

Matheus da Silva Serpa

Guillaume Houzeaux

Lucas Mello Schnorr

Nicolas Maillard

Luciano Paschoal Gaspary

Proceedings of the High Performance Computing - Third Latin American Conference, 2016

2015

Automatic task and data mapping in shared memory architectures.

[BibT_eX]

[DOI]

PhD thesis, 2015

Characterizing communication and page usage of parallel applications for thread and data mapping.

[BibT_eX]

[DOI]

Fabrice Dupros

Perform. Evaluation, 2015

Communication-aware process and thread mapping using online communication detection.

[BibT_eX]

[DOI]

Parallel Comput., 2015

Communication-aware thread mapping using the translation lookaside buffer.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

TABARNAC: visualizing and resolving memory access issues on NUMA architectures.

[BibT_eX]

[DOI]

David Beniamine

Guillaume Huard

Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Partial coscheduling of virtual machines based on memory access patterns.

[BibT_eX]

[DOI]

Jan Hendrik Schönherr

Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Reconfigurable Vector Extensions inside the DRAM.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Reconfigurable Communication-centric Systems-on-Chip, 2015

Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems.

[BibT_eX]

[DOI]

Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

An Efficient Algorithm for Communication-Based Task Mapping.

[BibT_eX]

[DOI]

Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Opportunities and Challenges of Performing Vector Operations inside the DRAM.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Symposium on Memory Systems, 2015

SiNUCA: A Validated Micro-Architecture Simulator.

[BibT_eX]

[DOI]

Carlos Villavieja

Francis Birck Moreira

Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems.

[BibT_eX]

[DOI]

Mohammad S. Alhakeem

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Saving memory movements through vector processing in the DRAM.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Compilers, 2015

2014

Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2014

Optimizing Memory Locality Using a Locality-Aware Page Table.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

kMAF: automatic kernel-level management of thread and data affinity.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Energy Efficient Last Level Caches via Last Read/Write Prediction.

[BibT_eX]

[DOI]

Carlos Villavieja

Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Analyzing resource interdependencies in multi-core architectures to improve scheduling decisions.

[BibT_eX]

[DOI]

Jan Hendrik Schönherr

Gero Mühl

Jan Richling

Proceedings of the 28th Annual ACM Symposium on Applied Computing, 2013

Communication-Based Mapping Using Shared Pages.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

High Performance Computing in the cloud: Deployment, performance and cost efficiency.

[BibT_eX]

[DOI]

Eduardo Roloff

Alexandre Carissimi

Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012

Evaluating High Performance Computing on the Windows Azure Platform.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

2010

Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors.

[BibT_eX]

[DOI]

Felipe Lopes Madruga

Eduardo Rocha Rodrigues

Jörg Schneider