Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

A Selective Nesting Approach for the Sparse Multi-threaded Cholesky Factorization.

[DOI]

Valentin Le Fèvre

Tetsuzo Usui

Alexandre E. Eichenberger

Proceedings of the 7th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2022

2021

Intelligent Adaptation of Hardware Knobs for Improving Performance and Power Consumption.

[DOI]

Pradip Bose

Miquel Moretó

IEEE Trans. Computers, 2021

Efficiently running SpMV on long vector architectures.

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Multilevel simulation-based co-design of next generation HPC microprocessors.

[DOI]

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Morrigan: A Composite Instruction TLB Prefetcher.

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Exploiting Page Table Locality for Agile TLB Prefetching.

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Dynamically Adapting Floating-Point Precision to Accelerate Deep Neural Network Training.

[DOI]

Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

Cache-aware Sparse Patterns for the Factorized Sparse Approximate Inverse Preconditioner.

[DOI]

Sergi Laut

Ricard Borrell

Rekai González-Alberquilla

Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy.

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

2020

Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies.

[DOI]

J. Supercomput., 2020

Iteration-fusing conjugate gradient for sparse linear systems with MPI + OmpSs.

[DOI]

J. Supercomput., 2020

Using Arm's scalable vector extension on stencil codes.

[DOI]

J. Supercomput., 2020

Semi-automatic validation of cycle-accurate simulation infrastructures: The case for gem5-x86.

[DOI]

Future Gener. Comput. Syst., 2020

Generating Efficient DNN-Ensembles with Evolutionary Computation.

[DOI]

Marc Ortiz

Florian Scheidegger

A. Cristiano I. Malossi

Eduard Ayguadé

CoRR, 2020

Reducing Data Motion to Accelerate the Training of Deep Neural Networks.

[DOI]

Sicong Zhuang

A. Cristiano I. Malossi

CoRR, 2020

Runtime-guided ECC protection using online estimation of memory vulnerability.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Cost-aware prediction of uncorrected DRAM errors in the field.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Characterizing the impact of last-level cache replacement policies on big-data workloads.

[DOI]

Alexandre Valentin Jamet

Lluc Alvarez

Daniel A. Jiménez

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Wavefront parallelization of recurrent neural networks on multi-core architectures.

[DOI]

Robin Kumar Sharma

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

RICH: implementing reductions in the cache hierarchy.

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Modeling and optimizing NUMA effects and prefetching with machine learning.

[DOI]

David Black-Schaffer

Miquel Moretó

Anastasiia Stupnikova

Mihail Popov

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Evaluating Mixed-Precision Arithmetic for 3D Generative Adversarial Networks to Simulate High Energy Physics Detectors.

[DOI]

Proceedings of the 19th IEEE International Conference on Machine Learning and Applications, 2020

Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions.

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

2019

Design trade-offs for emerging HPC processors based on mobile market technology.

[DOI]

Adrià Armejach

Miquel Moretó

J. Supercomput., 2019

Sampled Simulation of Task-Based Programs.

[DOI]

IEEE Trans. Computers, 2019

Special issue on the message passing interface.

[DOI]

Pavan Balaji

Parallel Comput., 2019

On the maturity of parallel applications for asymmetric multi-core processors.

[DOI]

J. Parallel Distributed Comput., 2019

Resilient gossip-inspired all-reduce algorithms for high-performance computing: Potential, limitations, and open questions.

[DOI]

Wilfried N. Gansterer

Elias Wimmer

Int. J. High Perform. Comput. Appl., 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

On the Benefits of Tasking with OpenMP.

[DOI]

Alejandro Rico

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Design Space Exploration of Next-Generation HPC Machines.

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Vulnerability Factor for ECC-protected Memory.

[DOI]

Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

Open-Source Shared Memory implementation of the HPCG benchmark: analysis, improvements and evaluation on Cavium ThunderX2.

[DOI]

Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Power efficient job scheduling by predicting the impact of processor manufacturing variability.

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

Convolutional Neural Network Training with Dynamic Epoch Ordering.

[DOI]

Ferran Plana Rius

Cecilio Angulo Bahón

Josep Maria Mirats Tur

Proceedings of the Artificial Intelligence Research and Development, 2019

POSTER: An Optimized Predication Execution for SIMD Extensions.

[DOI]

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Asynchronous and Exact Forward Recovery for Detected Errors in Iterative Solvers.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Reducing Cache Coherence Traffic with a NUMA-Aware Runtime Approach.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.

[DOI]

J. Supercomput., 2018

Memory Vulnerability: A Case for Delaying Error Reporting.

[DOI]

CoRR, 2018

Low-Precision Floating-Point Schemes for Neural Network Training.

[DOI]

CoRR, 2018

TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism.

[DOI]

Proceedings of the High Performance Computing - 33rd International Conference, 2018

Approximating a Multi-Grid Solver.

[DOI]

Valentin Le Fèvre

Leonardo Bautista-Gomez

Osman S. Unsal

Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Runtime-assisted cache coherence deactivation in task parallel programs.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Graph partitioning applied to DAG scheduling to reduce NUMA effects.

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Data Prefetching on In-order Processors.

[DOI]

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.

[DOI]

Rekai González-Alberquilla

Proceedings of the 32nd International Conference on Supercomputing, 2018

Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs.

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Architectural Support for Task Dependence Management with Flexible Software Scheduling.

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Stencil codes on a vector length agnostic architecture.

[DOI]

Adrià Armejach

Helena Caminal

Juan M. Cebrian

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Task Scheduling Techniques for Asymmetric Multi-Core Systems.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Prediction of the impact of network switch utilization on application performance via active measurement.

[DOI]

Parallel Comput., 2017

iQ: An Efficient and Flexible Queue-Based Simulation Framework.

[DOI]

Proceedings of the 25th IEEE International Symposium on Modeling, 2017

ATM: Approximate Task Memoization in the Runtime System.

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Iteration-fusing conjugate gradient.

[DOI]

Sicong Zhuang

Alexandre E. Eichenberger

Proceedings of the International Conference on Supercomputing, 2017

libPRISM: an intelligent adaptation of prefetch and SMT levels.

[DOI]

Pradip Bose

Proceedings of the International Conference on Supercomputing, 2017

Evaluating Scientific Workflow Execution on an Asymmetric Multicore Processor.

[DOI]

Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals.

[DOI]

Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016

Evaluation of HPC Applications' Memory Resource Consumption via Active Measurement.

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite.

[DOI]

ACM Trans. Archit. Code Optim., 2016

MUSA: a multi-level simulation approach for next-generation HPC machines.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

TaskPoint: Sampled simulation of task-based programs.

[DOI]

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

CATA: Criticality Aware Task Acceleration for Multicore Processors.

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware.

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

A framework for evaluating comprehensive fault resilience mechanisms in numerical programs.

[DOI]

J. Supercomput., 2015

Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7.

[DOI]

CoRR, 2015

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.

[DOI]

Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Runtime-Aware Architectures.

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Runtime-Aware Architectures: A First Approach.

[DOI]

Supercomput. Front. Innov., 2014

Active Measurement of Memory Resource Consumption.

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Active Measurement of the Impact of Network Switch Utilization on Application Performance.

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Evaluating Execution Time Predictability of Task-Based Programs on Multi-Core Processors.

[DOI]

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013

Performance Analysis Techniques for the Exascale Co-Design Process.

[DOI]

Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

2012

Poster: Autonomic Modeling of Data-Driven Application Behavior.

[DOI]

Steena D. S. Monteiro

Marc Casas-Guix

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Autonomic Modeling of Data-Driven Application Behavior.

[DOI]

Steena D. S. Monteiro

Marc Casas-Guix

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Fault resilience of the algebraic multi-grid solver.

[DOI]

Marc Casas-Guix

Bronis R. de Supinski

Martin Schulz

Proceedings of the International Conference on Supercomputing, 2012

2011

Simulating Whole Supercomputer Applications.

[DOI]

IEEE Micro, 2011

Extracting the optimal sampling frequency of applications using spectral analysis.

[DOI]

Concurr. Comput. Pract. Exp., 2011

Trace Spectral Analysis toward Dynamic Levels of Detail.

[DOI]

Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

2010

Spectral analysis of executions of computer programs and its applications on performance analysis.

[DOI]

PhD thesis, 2010

Automatic Phase Detection and Structure Extraction of MPI Applications.

[DOI]

Int. J. High Perform. Comput. Appl., 2010

2008

Automatic analysis of speedup of MPI applications.

[DOI]

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Prediction of behavior of MPI applications.

[DOI]

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007

Automatic Phase Detection of MPI Applications.

Proceedings of the Parallel Computing: Architectures, 2007

Automatic Structure Extraction from MPI Applications Tracefiles.

[DOI]