Miquel Pericàs

ACM Trans. Archit. Code Optim., 2022

Energy-Efficiency Evaluation of OpenMP Loop Transformations and Runtime Constructs.

[BibT_eX]

[DOI]

Henrik Valter

Axel Karlsson

CoRR, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.

[BibT_eX]

[DOI]

CoRR, 2022

STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Shisha: Online Scheduling of CNN Pipelines on Heterogeneous Architectures.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2022

2021

Mitigating inefficient task mappings with an Adaptive Resource-Moldable Scheduler (ARMS).

[BibT_eX]

[DOI]

Mustafa Abduljabbar

Mahmoud Eljammaly

CoRR, 2021

Vectorized Barrier and Reduction in LLVM OpenMP Runtime.

[BibT_eX]

[DOI]

Muhammed Nufail Farooqi

Proceedings of the OpenMP: Enabling Massive Node-Level Parallelism, 2021

An online guided tuning approach to run CNN pipelines on edge devices.

[BibT_eX]

[DOI]

Proceedings of the CF '21: Computing Frontiers Conference, 2021

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2020

Proceedings of the Thirteenth International Workshop on Programmability and Architectures for Heterogeneous Multicores (MULTIPROG-2020).

[BibT_eX]

[DOI]

Oscar Palomar

Jeckson Dellagostin Souza

Mahmoud Eljammaly

CoRR, 2020

Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments.

[BibT_eX]

[DOI]

Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Enhancing Multithreaded Performance of Asymmetric Multicores with SIMD Offloading.

[BibT_eX]

[DOI]

Antonio Carlos Schneider Beck

Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.

[BibT_eX]

[DOI]

Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

Enhancing Thread-Level Parallelism in Asymmetric Multicores using Transparent Instruction Offloading.

[BibT_eX]

[DOI]

Jeckson Dellagostin Souza

Antonio Carlos Schneider Beck

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019

LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.

[BibT_eX]

[DOI]

CoRR, 2019

An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity.

[BibT_eX]

[DOI]

CoRR, 2019

High performance scheduling of mixed-mode DAGs on heterogeneous multicores.

[BibT_eX]

[DOI]

Agnes Rohlin

Henrik Fahlgren

CoRR, 2019

QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems.

[BibT_eX]

[DOI]

Mehrzad Nejat

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems.

[BibT_eX]

[DOI]

Muhammad Waqar Azhar

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Elastic Places: An Adaptive Resource Manager for Scalable and Portable Performance.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

Global Dead-Block Management for Task-Parallel Programs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017

Trends in Data Locality Abstractions for HPC Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Runtime-Assisted Global Cache Management for Task-Based Parallel Programs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

SWAS: Stealing Work Using Approximate System-Load Information.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

RADAR: Runtime-assisted dead region management for last-level caches.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

POSTER: ξ-TAO: A Cache-centric Execution Model and Runtime for Deep Parallel Multicore Topologies.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

DAGViz: a DAG visualization tool for analyzing task-parallel program traces.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Self-Tuned Software-Managed Energy Reduction in InfiniBand Links.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

2014

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Scalable analysis of multicore data reuse and sharing.

[BibT_eX]

[DOI]

Kenjiro Taura

Satoshi Matsuoka

Proceedings of the 2014 International Conference on Supercomputing, 2014

Software-Managed Power Reduction in Infiniband Links.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Efficient String Sorting on Multi - and Many-Core Architectures.

[BibT_eX]

[DOI]

Aleksandr Drozd

Satoshi Matsuoka

Proceedings of the 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27, 2014

2013

Guest editorial: Workshop on Reconfigurable Computing.

[BibT_eX]

[DOI]

Ioannis Sourdis

Christos-Savvas Bouganis

J. Syst. Archit., 2013

A template system for the efficient compilation of domain abstractions onto reconfigurable computers.

[BibT_eX]

[DOI]

J. Syst. Archit., 2013

Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of Data Reuse in Task-Parallel Runtimes.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

2012

Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

PPMC: Hardware scheduling and memory management support for multi accelerators.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

BSArc: blacksmith streaming architecture for HPC accelerators.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'12, 2012

PPMC: A Programmable Pattern Based Memory Controller.

[BibT_eX]

[DOI]

Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011

Assessing Accelerator-Based HPC Reverse Time Migration.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2011

Implementation of a hierarchical N-body simulator using the Ompss programming model.

[BibT_eX]

[DOI]

Xavier Martorell

Yoav Etsion

Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

TARCAD: A template architecture for reconfigurable accelerator designs.

[BibT_eX]

[DOI]

Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

2010

FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

2009

Exploiting memory customization in FPGA for 3D stencil computations.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

2008

Affordable kilo-instruction processors.

[BibT_eX]

[DOI]

PhD thesis, 2008

Power-efficient VLIW design using clustering and widening.

[BibT_eX]

[DOI]

Int. J. Embed. Syst., 2008

Vectorized AES Core for High-throughput Secure Environments.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2008

A Two-Level Load/Store Queue Based on Execution Locality.

[BibT_eX]

[DOI]

Daniel A. Jiménez

Mateo Valero

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2007

A Flexible Heterogeneous Multi-Core Architecture.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

A decoupled KILO-instruction processor.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

2005

Kilo-Instruction Processors: Overcoming the Memory Wall.

[BibT_eX]

[DOI]

IEEE Micro, 2005

Chained In-Order/Out-of-Order DoubleCore Architecture.

[BibT_eX]

[DOI]

Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Decoupled State-Execute Architecture.

[BibT_eX]

[DOI]

Adrián Cristal

Rubén González

Mateo Valero

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

An asymmetric clustered processor based on value content.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

2004

High-performance and low-power VLIW cores for numerical computations.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.

[BibT_eX]

[DOI]

Proceedings of the Computer Systems: Architectures, 2004

An Optimized Front-End Physical Register File with Banking and Writeback Filtering.

[BibT_eX]

[DOI]

Rubén González

Adrián Cristal