Andrea Marongiu

Orcid: 0000-0003-1010-4762

According to our database1, Andrea Marongiu authored at least 120 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Invited Paper: On the Granularity of Bandwidth Regulation in FPGA-Based Heterogeneous Systems on Chip.
Proceedings of the 22nd International Workshop on Worst-Case Execution Time Analysis, 2024

HMB: Scheduling PREM-Like Real-Time Tasks at High Memory Bandwidth (Invited Paper).
Proceedings of the Fifth Workshop on Next Generation Real-Time Embedded Systems, 2024

2023
Performance Analysis of Six Semi-Automated Tumour Delineation Methods on [18F] Fluorodeoxyglucose Positron Emission Tomography/Computed Tomography (FDG PET/CT) in Patients with Head and Neck Cancer.
Sensors, September, 2023

Evaluating Controlled Memory Request Injection for Efficient Bandwidth Utilization and Predictable Execution in Heterogeneous SoCs.
ACM Trans. Embed. Comput. Syst., 2023

The Importance of Worst-Case Memory Contention Analysis for Heterogeneous SoCs.
CoRR, 2023

Fine-Grained QoS Control via Tightly-Coupled Bandwidth Monitoring and Regulation for FPGA-based Heterogeneous SoCs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022
A Taxonomy of Modern GPGPU Programming Methods: On the Benefits of a Unified Specification.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Understanding and Mitigating Memory Interference in FPGA-based HeSoCs.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

An FPGA Overlay for Efficient Real-Time Localization in 1/10th Scale Autonomous Vehicles.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
The Predictable Execution Model in Practice: Compiling Real Applications for COTS Hardware.
ACM Trans. Embed. Comput. Syst., 2021

HePREM: A Predictable Execution Model for GPU-based Heterogeneous SoCs.
IEEE Trans. Computers, 2021

Unmanned Vehicles in Smart Farming: a Survey and a Glance at Future Horizons.
Proceedings of the DroneSE and RAPIDO '21: Methods and Tools, 2021

A RISC-V-based FPGA Overlay to Simplify Embedded Accelerator Deployment.
Proceedings of the 24th Euromicro Conference on Digital System Design, 2021

2020
FlexFloat: A Software Library for Transprecision Computing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Dissecting the CUDA scheduling hierarchy: a Performance and Predictability Perspective.
Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, 2020

A Synergistic Approach to Predictable Compilation and Scheduling on Commodity Multi-Cores.
Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020

Evaluating Controlled Memory Request Injection to Counter PREM Memory Underutilization.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2020

Mixed-data-model heterogeneous compilation and OpenMP offloading.
Proceedings of the CC '20: 29th International Conference on Compiler Construction, 2020

2019
Extending the Lifetime of Nano-Blimps via Dynamic Motor Control.
J. Signal Process. Syst., 2019

Exploring Shared Virtual Memory for FPGA Accelerators with a Configurable IOMMU.
IEEE Trans. Computers, 2019

Combining PREM compilation and static scheduling for high-performance and predictable MPSoC execution.
Parallel Comput., 2019

Design and Evaluation of SmallFloat SIMD extensions to the RISC-V ISA.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Taming Data Caches for Predictable Execution on GPU-based SoCs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

2018
Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking.
IEEE Trans. Parallel Distributed Syst., 2018

The Quest for Energy-Efficient I$ Design in Ultra-Low-Power Clustered Many-Cores.
IEEE Trans. Multi Scale Comput. Syst., 2018

Runtime Support for Multiple Offload-Based Programming Models on Clustered Manycore Accelerators.
IEEE Trans. Emerg. Top. Comput., 2018

Synergistic HW/SW Approximation Techniques for Ultralow-Power Parallel Computing.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators.
J. Real Time Image Process., 2018

Hardware Transactional Memory Exploration in Coherence-Free Many-Core Architectures.
Int. J. Parallel Program., 2018

Energy-Quality Scalable Integrated Circuits and Systems: Continuing Energy Scaling in the Twilight of Moore's Law.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2018

Guest Editorial Energy-Quality Scalable Circuits and Systems for Sensing and Computing: From Approximate to Communication-Inspired and Learning-Based.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2018

On the Cost of Freedom From Interference in Heterogeneous SoCs.
Proceedings of the 21st International Workshop on Software and Compilers for Embedded Systems, 2018

Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution.
Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, 2018

A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

A transprecision floating-point platform for ultra-low power computing.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

HePREM: Enabling predictable GPU execution on heterogeneous SoC.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

HERO: an open-source research platform for HW/SW exploration of heterogeneous manycore systems.
Proceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, 2018

2017
Lightweight Virtual Memory Support for Zero-Copy Sharing of Pointer-Rich Data Structures in Heterogeneous Embedded SoCs.
IEEE Trans. Parallel Distributed Syst., 2017

Efficient Virtual Memory Sharing via On-Accelerator Page Table Walking in Heterogeneous Embedded SoCs.
ACM Trans. Embed. Comput. Syst., 2017

Edge-TM: Exploiting Transactional Memory for Error Tolerance and Energy Efficiency.
ACM Trans. Embed. Comput. Syst., 2017

A software stack for next-generation automotive systems on many-core heterogeneous platforms.
Microprocess. Microsystems, 2017

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA.
CoRR, 2017

On the Accuracy of Near-Optimal CPU-Based Path Planning for UAVs.
Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems, 2017

Enabling zero-copy OpenMP offloading on the PULP many-core accelerator.
Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems, 2017

GPU-Accelerated Real-Time Path Planning and the Predictable Execution Model.
Proceedings of the International Conference on Computational Science, 2017

Ultra low-power visual odometry for nano-scale unmanned aerial vehicles.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

GPUguard: Towards supporting a predictable execution model for heterogeneous SoC.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2017

2016
He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores.
J. Signal Process. Syst., 2016

VirtualSoC: A Research Tool for Modern MPSoCs.
ACM Trans. Embed. Comput. Syst., 2016

Controlling NUMA effects in embedded manycore applications with lightweight nested parallelism support.
Parallel Comput., 2016

Exploring Single Source Shortest Path Parallelization on Shared Memory Accelerators.
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, 2016

On the effectiveness of OpenMP teams for cluster-based many-core accelerators.
Proceedings of the International Conference on High Performance Computing & Simulation, 2016

Always-on motion detection with application-level error control on a near-threshold approximate computing platform.
Proceedings of the 2016 IEEE International Conference on Electronics, Circuits and Systems, 2016

A Software Stack for Next-Generation Automotive Systems on Many-Core Heterogeneous Platforms.
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016

An optimized task-based runtime system for resource-constrained parallel accelerators.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Enabling the heterogeneous accelerator model on ultra-low power microcontroller platforms.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

An energy-efficient parallel algorithm for real-time near-optimal UAV path planning.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Enabling OpenVX support in mW-scale parallel accelerators.
Proceedings of the 2016 International Conference on Compilers, 2016

Thrifty-malloc: A HW/SW codesign for the dynamic management of hardware transactional memory in embedded multicore systems.
Proceedings of the 2016 International Conference on Compilers, 2016

2015
GPU Acceleration for Simulating Massively Parallel Many-Core Platforms.
IEEE Trans. Parallel Distributed Syst., 2015

Simplifying Many-Core-Based Heterogeneous SoC Programming With Offload Directives.
IEEE Trans. Ind. Informatics, 2015

Architecture Support for Tightly-Coupled Multi-Core Clusters with Shared-Memory HW Accelerators.
IEEE Trans. Computers, 2015

P-SOCRATES: A parallel software framework for time-critical many-core systems.
Microprocess. Microsystems, 2015

A framework for optimizing OpenVX applications performance on embedded manycore accelerators.
Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, 2015

ADRENALINE: An OpenVX Environment to Optimize Embedded Vision Applications on Many-core Accelerators.
Proceedings of the IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2015

Enabling Scalable and Fine-Grained Nested Parallelism on Embedded Many-cores.
Proceedings of the IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2015

Synergistic Architecture and Programming Model Support for Approximate Micropower Computing.
Proceedings of the 2015 IEEE Computer Society Annual Symposium on VLSI, 2015

PULP: A parallel ultra low power platform for next generation IoT applications.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

Playing with Fire: Transactional Memory Revisited for Error-Resilient and Energy-Efficient MPSoC Execution.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

OpenMP and timing predictability: a possible union?
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs.
Proceedings of the 2015 International Conference on Hardware/Software Codesign and System Synthesis, 2015

An Evaluation of Memory Sharing Performance for Heterogeneous Embedded SoCs with Many-Core Accelerators.
Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, 2015

Runtime Support for Multiple Offload-Based Programming Models on Embedded Manycore Accelerators.
Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores, 2015

Timing characterization of OpenMP4 tasking model.
Proceedings of the 2015 International Conference on Compilers, 2015

2014
Improving Resilience to Timing Errors by Exposing Variability Effects to Software in Tightly-Coupled Processor Clusters.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2014

The Challenge of Time-Predictability in Modern Many-Core Architectures.
Proceedings of the 14th International Workshop on Worst-Case Execution Time Analysis, 2014

Speculative synchronization for coherence-free embedded NUMA architectures.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Augmenting manycore programmable accelerators with photonic interconnect technology for the high-end embedded computing domain.
Proceedings of the Eighth IEEE/ACM International Symposium on Networks-on-Chip, 2014

A Virtualization Framework for IOMMU-less Many-Core Accelerators.
Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores.
Proceedings of the 2nd International Workshop on Many-core Embedded Systems, 2014

A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory.
Proceedings of the 12th IEEE International Conference on Embedded and Ubiquitous Computing, 2014

P-SOCRATES: A Parallel Software Framework for Time-Critical Many-Core Systems.
Proceedings of the 17th Euromicro Conference on Digital System Design, 2014

Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

A tightly-coupled hardware controller to improve scalability and programmability of shared-memory heterogeneous clusters.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

He-P2012: Architectural heterogeneity exploration on a scalable many-core platform.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
An integrated, programming model-driven framework for NoC-QoS support in cluster-based embedded many-cores.
Parallel Comput., 2013

SIM<i>in</i>G-1<i>k</i>: A thousand-core simulator running on general-purpose graphical processing units.
Concurr. Comput. Pract. Exp., 2013

Transparent and energy-efficient speculation on NUMA architectures for embedded MPSoCs.
Proceedings of the 1st International Workshop on Many-core Embedded Systems 2013, 2013

Improving the programmability of STHORM-based heterogeneous systems with offload-enabled OpenMP.
Proceedings of the 1st International Workshop on Many-core Embedded Systems 2013, 2013

VirtualSoC: A Full-System Simulation Environment for Massively Parallel Heterogeneous System-on-Chip.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Variation-tolerant OpenMP tasking on tightly-coupled processor clusters.
Proceedings of the Design, Automation and Test in Europe, 2013

Enabling fine-grained OpenMP tasking on tightly-coupled shared memory clusters.
Proceedings of the Design, Automation and Test in Europe, 2013

Architecture and programming model support for efficient heterogeneous computing on tigthly-coupled shared-memory clusters.
Proceedings of the 2013 Conference on Design and Architectures for Signal and Image Processing, 2013

A variability-aware OpenMP environment for efficient execution of accuracy-configurable computation on shared-FPU processor clusters.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters.
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2013

2012
An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs.
IEEE Trans. Computers, 2012

A tightly-coupled multi-core cluster with shared-memory HW accelerators.
Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

Low-Overhead Barrier Synchronization for OpenMP-like Parallelism on the Single-Chip Cloud Computer.
Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, 2012

OpenMP-based Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters.
Proceedings of the 15th Euromicro Conference on Digital System Design, 2012

Fast and lightweight support for nested parallelism on cluster-based embedded many-cores.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

2011
Supporting OpenMP on a multi-cluster embedded MPSoC.
Microprocess. Microsystems, 2011

Exploring instruction caching strategies for tightly-coupled shared-memory clusters.
Proceedings of the 2011 International Symposium on System on Chip, 2011

SoC-TM: integrated HW/SW support for transactional memory programming on embedded MPSoCs.
Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis, 2011

MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor.
Proceedings of the 8th Conference on Computing Frontiers, 2011

GPGPU-Accelerated Parallel and Fast Simulation of Thousand-Core Platforms.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Scalable instruction set simulator for thousand-core architectures running on GPGPUs.
Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

Evaluating OpenMP Support Costs on MPSoCs.
Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

Efficient OpenMP data mapping for multicore platforms with vertically stacked memory.
Proceedings of the Design, Automation and Test in Europe, 2010

Exploring programming model-driven QoS support for NoC-based platforms.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010

Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
OpenMP Support for NBTI-Induced Aging Tolerance in MPSoCs.
Proceedings of the Stabilization, 2009

Efficient OpenMP support and extensions for MPSoCs with explicitly managed memory hierarchy.
Proceedings of the Design, Automation and Test in Europe, 2009

2008
Analysis of Power Management Strategies for a Large-Scale SoC Platform in 65nm Technology.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

2007
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms.
Proceedings of the 2007 International Conference on Compilers, 2007

2006
Automatic Application Partitioning on FPGA/CPU Systems Based on Detailed Low-Level Information.
Proceedings of the Ninth Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD 2006), 30 August, 2006


  Loading...