Miquel Moretó

Orcid: 0000-0002-9848-8758

According to our database1, Miquel Moretó authored at least 138 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Hardware Acceleration for High-Volume Operations of CRYSTALS-Kyber and CRYSTALS-Dilithium.
ACM Trans. Reconfigurable Technol. Syst., September, 2024

OpenPiton4HPC: Optimizing OpenPiton Toward High-Performance Manycores.
IEEE J. Emerg. Sel. Topics Circuits Syst., September, 2024

GenArchBench: A genomics benchmark suite for arm HPC processors.
Future Gener. Comput. Syst., 2024

A Mess of Memory System Benchmarking, Simulation and Application Profiling.
CoRR, 2024

Floating Point HUB Adder for RISC-V Sargantana Processor.
CoRR, 2024

A Safety-Critical, RISC-V SoC Integrated and ASIC-Ready Classic McEliece Accelerator.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2024

2023
WFA-FPGA: An efficient accelerator of the wavefront algorithm for short and long read genomics alignment.
Future Gener. Comput. Syst., December, 2023

Special Issue on the 2023 International Symposium on Networks-on-Chip (NOCS 2023).
IEEE Des. Test, December, 2023

WFA-GPU: gap-affine pairwise read-alignment using GPUs.
Bioinform., December, 2023

Accurate and efficient constrained molecular dynamics of polymers using Newton's method and special purpose code.
Comput. Phys. Commun., July, 2023

Functional Verification of a RISC-V Vector Accelerator.
IEEE Des. Test, June, 2023

Adaptive Power Shifting for Power-Constrained Heterogeneous Systems.
IEEE Trans. Computers, March, 2023

Optimal gap-affine alignment in <i>O</i>(<i>s</i>) space.
Bioinform., February, 2023

Porting and Optimizing BWA-MEM2 Using the Fujitsu A64FX Processor.
IEEE ACM Trans. Comput. Biol. Bioinform., 2023

SpChar: Characterizing the Sparse Puzzle via Decision Trees.
CoRR, 2023

Characterization of a Coherent Hardware Accelerator Framework for SoCs.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2023

A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose Processors.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

OpenPiton Optimizations Towards High Performance Manycores.
Proceedings of the 16th International Workshop on Network on Chip Architectures, 2023

GMX: Instruction Set Extensions for Fast, Scalable, and Efficient Genome Sequence Alignment.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

DynAMO: Improving Parallelism Through Dynamic Placement of Atomic Memory Operations.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

WFAsic: A High-Performance ASIC Accelerator for DNA Sequence Alignment on a RISC-V SoC.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Mix-GEMM: An efficient HW-SW Architecture for Mixed-Precision Quantized Deep Neural Networks Inference on Edge Devices.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

RISC-V for Genome Data Analysis: Opportunities and Challenges.
Proceedings of the 38th Conference on Design of Circuits and Integrated Systems, 2023


Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2022
Compiler-Assisted Compaction/Restoration of SIMD Instructions.
IEEE Trans. Parallel Distributed Syst., 2022

A Security Model for Randomization-based Protected Caches.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

Compressed Sparse FM-Index: Fast Sequence Alignment Using Large K-Steps.
IEEE ACM Trans. Comput. Biol. Bioinform., 2022

Accelerating Edit-Distance Sequence Alignment on GPU Using the Wavefront Algorithm.
IEEE Access, 2022

TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming Models.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Sargantana: A 1 GHz+ In-Order RISC-V Processor with SIMD Vector Extensions in 22nm FD-SOI.
Proceedings of the 25th Euromicro Conference on Digital System Design, 2022


Towards Reconfigurable Accelerators in HPC: Designing a Multipurpose eFPGA Tile for Heterogeneous SoCs.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

2021
On the use of many-core Marvell ThunderX2 processor for HPC workloads.
J. Supercomput., 2021

Intelligent Adaptation of Hardware Knobs for Improving Performance and Power Consumption.
IEEE Trans. Computers, 2021

PIugSMART: a pluggable open-source module to implement multihop bypass in networks-on-chip.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021

PLANAR: a programmable accelerator for near-memory data rearrangement.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

gem5 + rtl: A Framework to Enable RTL Models Inside a Full-System Simulator.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

VIA: A Smart Scratchpad for Vector Units with Application to Sparse Matrix Computations.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

HLS-Based HW/SW Co-Design of the Post-Quantum Classic McEliece Cryptosystem.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

An FPGA Accelerator of the Wavefront Algorithm for Genomics Pairwise Alignment.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

OpenCL-based FPGA Accelerator for Semi-Global Approximate String Matching Using Diagonal Bit-Vectors.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

PrioRAT: Criticality-Driven Prioritization Inside the On-Chip Memory Hierarchy.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

SafeSU: an Extended Statistics Unit for Multicore Timing Interference.
Proceedings of the 26th IEEE European Test Symposium, 2021

Mont-Blanc 2020: Towards Scalable and Power Efficient European HPC Processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020
Efficiency analysis of modern vector architectures: vector ALU sizes, core counts and clock frequencies.
J. Supercomput., 2020

Using Arm's scalable vector extension on stencil codes.
J. Supercomput., 2020

Semi-automatic validation of cycle-accurate simulation infrastructures: The case for gem5-x86.
Future Gener. Comput. Syst., 2020

Runtime-guided ECC protection using online estimation of memory vulnerability.
Proceedings of the International Conference for High Performance Computing, 2020

BST: A BookSim-Based Toolset to Simulate NoCs with Single- and Multi-Hop Bypass.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

RICH: implementing reductions in the cache hierarchy.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Modeling and optimizing NUMA effects and prefetching with machine learning.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Improving Predication Efficiency through Compaction/Restoration of SIMD Instructions.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

A Hardware/Software Co-Design of K-mer Counting Using a CAPI-Enabled FPGA.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020


2019
Design trade-offs for emerging HPC processors based on mobile market technology.
J. Supercomput., 2019

Sampled Simulation of Task-Based Programs.
IEEE Trans. Computers, 2019

On the maturity of parallel applications for asymmetric multi-core processors.
J. Parallel Distributed Comput., 2019

The international race towards Exascale in Europe.
CCF Trans. High Perform. Comput., 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

On the Benefits of Tasking with OpenMP.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Lagarto I RISC-V Multi-core: Research Challenges to Build and Integrate a Network-on-Chip.
Proceedings of the Supercomputing, 2019

Design Space Exploration of Next-Generation HPC Machines.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Vulnerability Factor for ECC-protected Memory.
Proceedings of the 25th IEEE International Symposium on On-Line Testing and Robust System Design, 2019

Power efficient job scheduling by predicting the impact of processor manufacturing variability.
Proceedings of the ACM International Conference on Supercomputing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.
Proceedings of the ACM International Conference on Supercomputing, 2019

POSTER: An Optimized Predication Execution for SIMD Extensions.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

POSTER: SPiDRE: Accelerating Sparse Memory Access Patterns.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Asynchronous and Exact Forward Recovery for Detected Errors in Iterative Solvers.
IEEE Trans. Parallel Distributed Syst., 2018

Reducing Cache Coherence Traffic with a NUMA-Aware Runtime Approach.
IEEE Trans. Parallel Distributed Syst., 2018

Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.
J. Supercomput., 2018

Memory Vulnerability: A Case for Delaying Error Reporting.
CoRR, 2018

TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

Runtime-assisted cache coherence deactivation in task parallel programs.
Proceedings of the International Conference for High Performance Computing, 2018

Graph partitioning applied to DAG scheduling to reduce NUMA effects.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

ChopStiX: Systematic Extraction of Code-Representative Microbenchmarks.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Data Prefetching on In-order Processors.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Architectural Support for Task Dependence Management with Flexible Software Scheduling.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Stencil codes on a vector length agnostic architecture.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Task Scheduling Techniques for Asymmetric Multi-Core Systems.
IEEE Trans. Parallel Distributed Syst., 2017

SEDEA: A Sensible Approach to Account DRAM Energy in Multicore Systems.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

iQ: An Efficient and Flexible Queue-Based Simulation Framework.
Proceedings of the 25th IEEE International Symposium on Modeling, 2017

ATM: Approximate Task Memoization in the Runtime System.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

libPRISM: an intelligent adaptation of prefetch and SMT levels.
Proceedings of the International Conference on Supercomputing, 2017

Evaluating Scientific Workflow Execution on an Asymmetric Multicore Processor.
Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

Runtime-Assisted Shared Cache Insertion Policies Based on Re-reference Intervals.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2016
DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems.
ACM Trans. Design Autom. Electr. Syst., 2016

Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach.
IEEE Trans. Computers, 2016

Sensible Energy Accounting with Abstract Metering for Multicore Systems.
ACM Trans. Archit. Code Optim., 2016

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite.
ACM Trans. Archit. Code Optim., 2016

MUSA: a multi-level simulation approach for next-generation HPC machines.
Proceedings of the International Conference for High Performance Computing, 2016

TaskPoint: Sampled simulation of task-based programs.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

CATA: Criticality Aware Task Acceleration for Multicore Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.
Proceedings of the 2016 International Conference on Supercomputing, 2016

POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7.
CoRR, 2015

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers.
Proceedings of the International Conference for High Performance Computing, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015


Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Runtime-Aware Architectures: A First Approach.
Supercomput. Front. Innov., 2014

Per-task Energy Accounting in Computing Systems.
IEEE Comput. Archit. Lett., 2014

DReAM: Per-Task DRAM Energy Metering in Multicore Systems.
Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Evaluating Execution Time Predictability of Task-Based Programs on Multi-Core Processors.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Fair CPU time accounting in CMP+SMT processors.
ACM Trans. Archit. Code Optim., 2013

Hardware support for accurate per-task energy metering in multicore systems.
ACM Trans. Archit. Code Optim., 2013

Task mapping in rectangular twisted tori.
Proceedings of the 2013 Spring Simulation Multiconference, SpringSim '13, 2013

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

On the convergence of mainstream and mission-critical markets.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Tessellation: refactoring the OS around explicit resource containers with continuous adaptation.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

2012
CPU Accounting for Multicore Processors.
IEEE Trans. Computers, 2012

Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Characterizing thread placement in the IBM POWER7 processor.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Optimal task assignment in multithreaded processors: a statistical approach.
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011
Dynamic Cache Partitioning Based on the MLP of Cache Misses.
Trans. High Perform. Embed. Archit. Compil., 2011

Simulating Whole Supercomputer Applications.
IEEE Micro, 2011

2010
Improving cache Behavior in CMP architectures throug cache partitioning techniques.
PhD thesis, 2010

Twisted Torus Topologies for Enhanced Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 2010

Adapting cache partitioning algorithms to pseudo-LRU replacement policies.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Load balancing using dynamic cache allocation.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
FlexDCP: a QoS framework for CMP architectures.
ACM SIGOPS Oper. Syst. Rev., 2009

CPU Accounting in CMP Processors.
IEEE Comput. Archit. Lett., 2009

ITCA: Inter-task Conflict-Aware CPU Accounting for CMPs.
Proceedings of the PACT 2009, 2009

2008
Modeling Toroidal Networks with the Gaussian Integers.
IEEE Trans. Computers, 2008

Multicore Resource Management.
IEEE Micro, 2008

MLP-Aware Dynamic Cache Partitioning.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

Architecture Performance Prediction Using Evolutionary Artificial Neural Networks.
Proceedings of the Applications of Evolutionary Computing, 2008

Evolutionary system for prediction and optimization of hardware architecture performance.
Proceedings of the IEEE Congress on Evolutionary Computation, 2008

2007
Explaining Dynamic Cache Partitioning Speed Ups.
IEEE Comput. Archit. Lett., 2007

Online Prediction of Applications Cache Utility.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Mixed-radix Twisted Torus Interconnection Networks.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
Dense Gaussian Networks: Suitable Topologies for On-Chip Multiprocessors.
Int. J. Parallel Program., 2006

A Generalization of Perfect Lee Codes over Gaussian Integers.
Proceedings of the Proceedings 2006 IEEE International Symposium on Information Theory, 2006


  Loading...