Antonino Tumeo

Orcid: 0000-0001-9452-120X

Affiliations:
  • Pacific Northwest National Laboratory, USA


According to our database1, Antonino Tumeo authored at least 149 papers between 2006 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Analyzing inference workloads for spatiotemporal modeling.
Future Gener. Comput. Syst., 2025

2024
Data Transfer Optimizations for Host-CPU and Accelerators in AXI4MLIR.
CoRR, 2024

ICED: An Integrated CGRA Framework Enabling DVFS-Aware Acceleration.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

To Cache or not to Cache? Exploring the Design Space of Tunable, HLS-generated Accelerators.
Proceedings of the International Symposium on Memory Systems, 2024

FuseIM: Fusing Probabilistic Traversals for Influence Maximization on Exascale Systems.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

Towards Automated Generation of Chiplet-Based Systems Invited Paper.
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023
Fused Breadth-First Probabilistic Traversals on Distributed GPU Systems.
CoRR, 2023

VecPAC: A Vectorizable and Precision-Aware CGRA.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

Towards On-Chip Learning for Low Latency Reasoning with End-to-End Synthesis.
Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

2022
Svelto: High-Level Synthesis of Multi-Threaded Accelerators for Graph Analytics.
IEEE Trans. Computers, 2022

End-to-End Synthesis of Dynamically Controlled Machine Learning Accelerators.
IEEE Trans. Computers, 2022

Towards scaling community detection on distributed-memory heterogeneous systems.
Parallel Comput., 2022

Bridging Python to Silicon: The SODA Toolchain.
IEEE Micro, 2022

ASAP: automatic synthesis of area-efficient and precision-aware CGRAs.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Accelerating Random Forest Classification on GPU and FPGA.
Proceedings of the 51st International Conference on Parallel Processing, 2022

SODA Synthesizer: An Open-Source, Multi-Level, Modular, Extensible Compiler from High-Level Frameworks to Silicon.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

An MLIR-based Compiler Flow for System-Level Design and Hardware Acceleration.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications on CGRAs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

From High-Level Frameworks to custom Silicon with SODA.
Proceedings of the 2022 IEEE Hot Chips 34 Symposium, 2022

SO(DA)<sup>2</sup>: End-to-end Generation of Specialized Reconfigurable Architectures (Invited Talk).
Proceedings of the 13th Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2022

The SODA approach: leveraging high-level synthesis for hardware/software co-design and hardware specialization: invited.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Hardware acceleration of complex machine learning models through modern high-level synthesis.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

SODA-OPT an MLIR based flow for co-design and high-level synthesis.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

VWC-BERT: Scaling Vulnerability-Weakness-Exploit Mapping on Modern AI Accelerators.
Proceedings of the IEEE International Conference on Big Data, 2022

MLIR Loop Optimizations for High-Level Synthesis: A Case Study.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
IEEE Trans. Parallel Distributed Syst., 2021

HAM: Hotspot-Aware Manager for Improving Communications With 3D-Stacked Memory.
IEEE Trans. Computers, 2021

Energy characterization of graph workloads.
Sustain. Comput. Informatics Syst., 2021

EXAGRAPH: Graph and combinatorial methods for enabling exascale applications.
Int. J. High Perform. Comput. Appl., 2021

The future is big graphs: a community view on graph processing systems.
Commun. ACM, 2021

High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

Automated Generation of Integrated Digital and Spiking Neuromorphic Machine Learning Accelerators.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Invited: Bambu: an Open-Source Research Framework for the High-Level Synthesis of Complex Applications.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Towards Automatic and Agile AI/ML Accelerator Design with End-to-End Synthesis.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 2.
ACM Trans. Parallel Comput., 2020

Introduction to the TOPC Special Issue on Innovations in Systems for Irregular Applications, Part 1.
ACM Trans. Parallel Comput., 2020

ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing.
CoRR, 2020

Preempt: scalable epidemic interventions using submodular optimization on multi-GPU systems.
Proceedings of the International Conference for High Performance Computing, 2020

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Message from the workshop chairs.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

cuRipples: influence maximization on multi-GPU systems.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

OpenCGRA: An Open-Source Unified Framework for Modeling, Testing, and Evaluating CGRAs.
Proceedings of the 38th IEEE International Conference on Computer Design, 2020

SODA: a New Synthesis Infrastructure for Agile Hardware Design of Machine Learning Accelerators.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

Invited: Software Defined Accelerators From Learning Tools Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Special Issue on: Systems for Learning, Inferencing, and Discovering (SLID).
J. Parallel Distributed Comput., 2019

UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing.
CoRR, 2019

Advert: An Asynchronous Runtime for Fine-Grained Network Systems.
Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019

PIMS: a lightweight processing-in-memory accelerator for stencil computations.
Proceedings of the International Symposium on Memory Systems, 2019

Introduction to GrAPL 2019.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

MAC: Memory Access Coalescer for 3D-Stacked Memory.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Scaling and Quality of Modularity Optimization Methods for Graph Clustering.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

A Parallel Graph Environment for Real-World Data Analytics Workflows.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Data and model convergence: a case for software defined architectures.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

Software defined architectures for data analytics.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

POSTER: Memory Hotspot Optimization for Data-Intensive Applications.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Guest Editorial: Special Issue on Computing Frontiers.
Int. J. Parallel Program., 2018

Adaptive anonymization of data using b-edge cover.
Proceedings of the International Conference for High Performance Computing, 2018

MiniVite: A Graph Analytics Benchmarking Tool for Massively Parallel Systems.
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

Introduction to GraML 2018.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Distributed Louvain Algorithm for Graph Community Detection.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Scalable Distributed Memory Community Detection Using Vite.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

2017
Exploring Efficient Hardware Support for Applications with Irregular Memory Patterns on Multinode Manycore Architectures.
IEEE Trans. Parallel Distributed Syst., 2017

Exploring performance and energy tradeoffs for irregular applications: A case study on the Tilera many-core architecture.
J. Parallel Distributed Comput., 2017

Introduction to GraML Workshop.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Community Detection on the GPU.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Exploring DataVortex Systems for Irregular Applications.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Scalable static and dynamic community detection using Grappolo.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

Architecture independent integrated early performance and energy estimation.
Proceedings of the Eighth International Green and Sustainable Computing Conference, 2017

Pushing the Limits of Irregular Access Patterns on Emerging Network Architecture: A Case Study.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Special Issue on Theory and Practice of Irregular Applications (TaPIA).
Parallel Comput., 2016

Assessing Advanced Technology in CENATE.
Proceedings of the IEEE International Conference on Networking, 2016

Modeling the Impact of Silicon Photonics on Graph Analytics.
Proceedings of the IEEE International Conference on Networking, 2016

Efficient synthesis of graph methods: a dynamically scheduled architecture.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Exploring Data Vortex Network Architectures.
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

A dynamically scheduled architecture for the synthesis of graph methods.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

A Dynamically Scheduled Architecture for the Synthesis of Graph Database Queries.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

Enabling the high level synthesis of data analytics accelerators.
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016

2015
Special Issue on Architectures and Algorithms for Irregular Applications (AAIA) - Guest editors' introduction.
J. Parallel Distributed Comput., 2015

Irregular Applications: From Architectures to Algorithms [Guest editors' introduction].
Computer, 2015

In-Memory Graph Databases for Web-Scale Data.
Computer, 2015

High Level Synthesis of RDF Queries for Graph Analytics.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Optimizing Approximate Weighted Matching on Nvidia Kepler K40.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

Inter-procedural resource sharing in High Level Synthesis through function proxies.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Function Proxies for Improved Resource Sharing in High Level Synthesis.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

High-Performance, Distributed Dictionary Encoding of RDF Datasets.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Optimizing irregular applications for energy and performance on the Tilera many-core architecture.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Power and performance trade-offs for Space Time Adaptive Processing.
Proceedings of the 26th IEEE International Conference on Application-specific Systems, 2015

GEMS: Graph Database Engine for Multithreaded Systems.
Proceedings of the Big Data - Algorithms, Analytics, and Applications., 2015

2014
Toward a data scalable solution for facilitating discovery of science resources.
Parallel Comput., 2014

Scaling Semantic Graph Databases in Size and Performance.
IEEE Micro, 2014

Scaling Irregular Applications through Data Aggregation and Software Multithreading.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

High-level synthesis of memory bound and irregular parallel applications with Bambu.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

An adaptive Memory Interface Controller for improving bandwidth utilization of hybrid and reconfigurable systems.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems.
Proceedings of the Numerical Computations with GPUs, 2014

2013
Composing Data Parallel Code for a SPARQL Graph Engine.
Proceedings of the International Conference on Social Computing, SocialCom 2013, 2013

Toward a data scalable solution for facilitating discovery of scientific data resources.
Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, 2013

YAPPA: A compiler-based parallelization framework for irregular applications on MPSoCs.
Proceedings of the 24th IEEE International Symposium on Rapid System Prototyping, 2013

Prototyping hardware support for irregular applications.
Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2013

Exploring manycore multinode systems for irregular applications with FPGA prototyping.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Power/Performance Trade-Offs of Small Batched LU Based Solvers on GPUs.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Accelerating subsurface transport simulation on heterogeneous clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Accelerating semantic graph databases on commodity clusters.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Exploring hardware support for scaling irregular applications on multi-node multi-core architectures.
Proceedings of the 24th International Conference on Application-Specific Systems, 2013

Ant Colony Optimization for mapping, scheduling and placing in reconfigurable systems.
Proceedings of the 2013 NASA/ESA Conference on Adaptive Hardware and Systems, 2013

2012
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer.
IEEE Trans. Parallel Distributed Syst., 2012

Aho-Corasick String Matching on Shared and Distributed-Memory Parallel Architectures.
IEEE Trans. Parallel Distributed Syst., 2012

Approximate weighted matching on emerging manycore and multithreaded architectures.
Int. J. High Perform. Comput. Appl., 2012

Designing Next-Generation Massively Multithreaded Architectures for Irregular Applications.
Computer, 2012

A High Performance Computing Network and System Simulator for the Power Grid: NGNS^2.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Efficient Sorting on the Tilera Manycore Architecture.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

A Bandwidth-Optimized Multi-core Architecture for Irregular Applications.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Towards efficient execution of irregular applications: panel outline.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Irregular applications: architectures & algorithms.
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011

Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Experiences with String Matching on the Fermi Architecture.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

Emulating Transactional Memory on FPGA Multiprocessors.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

2010
Ant Colony Heuristic for Mapping and Scheduling Tasks and Communications on Heterogeneous Embedded Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2010

Accelerating DNA analysis applications on GPU clusters.
Proceedings of the IEEE 8th Symposium on Application Specific Processors, 2010

Multiprocessor systems-on-chip synthesis using multi-objective evolutionary computation.
Proceedings of the Genetic and Evolutionary Computation Conference, 2010

A Compact Transactional Memory Multiprocessor System on FPGA.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

A reconfigurable multiprocessor architecture for a reliable face recognition implementation.
Proceedings of the Design, Automation and Test in Europe, 2010

Efficient pattern matching on GPUs for intrusion detection systems.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Mapping and scheduling of parallel C applications with ant colony optimization onto heterogeneous reconfigurable MPSoCs.
Proceedings of the 15th Asia South Pacific Design Automation Conference, 2010

2009
Performance estimation for task graphs combining sequential path profiling and control dependence regions.
Proceedings of the 7th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2009), 2009

Performance modeling of parallel applications on MPSoCs.
Proceedings of the 2008 IEEE International Symposium on System-on-Chip, 2009

A multiprocessor self-reconfigurable JPEG2000 encoder.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Evolutionary algorithms for the mapping of pipelined applications onto heterogeneous embedded systems.
Proceedings of the Genetic and Evolutionary Computation Conference, 2009

HW/SW methodologies for synchronization in FPGA multiprocessors.
Proceedings of the ACM/SIGDA 17th International Symposium on Field Programmable Gate Arrays, 2009

Mapping pipelined applications onto heterogeneous embedded systems: a bayesian optimization algorithm based approach.
Proceedings of the 7th International Conference on Hardware/Software Codesign and System Synthesis, 2009

Prototyping pipelined applications on a heterogeneous FPGA multiprocessor virtual platform.
Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008
Improving evolutionary exploration to area-time optimization of FPGA designs.
J. Syst. Archit., 2008

Ant colony optimization for mapping and scheduling in heterogeneous multiprocessor systems.
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

A Dual-Priority Real-Time Multiprocessor System on FPGA for Automotive Applications.
Proceedings of the Design, Automation and Test in Europe, 2008

Lightweight DMA management mechanisms for multiprocessors on FPGA.
Proceedings of the 19th IEEE International Conference on Application-Specific Systems, 2008

2007
An Interrupt Controller for FPGA-based Multiprocessors.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

An Evolutionary Approach to Area-Time Optimization of FPGA designs.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

An Internal Partial Dynamic Reconfiguration Implementation of the JPEG Encoder for Low-Cost FPGAsb.
Proceedings of the 2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), 2007

A Pipelined Fast 2D-DCT Accelerator for FPGA-based SoCs.
Proceedings of the 2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), 2007

Automatic Parallelization of Sequential Specifications for Symmetric MPSoCs.
Proceedings of the Embedded System Design: Topics, Techniques and Trends, IFIP TC10 Working Conference: International Embedded Systems Symposium (IESS), May 30, 2007

A design kit for a fully working shared memory multiprocessor on FPGA.
Proceedings of the 17th ACM Great Lakes Symposium on VLSI 2007, 2007

Fitness inheritance in evolutionary and multi-objective high-level synthesis.
Proceedings of the IEEE Congress on Evolutionary Computation, 2007

A Self-Reconfigurable Implementation of the JPEG Encoder.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

2006
Hardware DWT accelerator for MultiProcessor System-on-Chip on FPGA.
Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006


  Loading...