David Wentzlaff

Orcid: 0000-0002-6337-5630

According to our database1, David Wentzlaff authored at least 78 papers between 2002 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


On csauthors.net:


Optically Connected Multi-Stack HBM Modules for Large Language Model Training and Inference.
IEEE Comput. Archit. Lett., 2025

Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

LUCIE: A Universal Chiplet-Interposer Design Framework for Plug-and-Play Integration.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

MindPalace: A Framework for Studying Microarchitecture Design of Function-as-a-Service.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

MuchiSim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

LLMCompass: Enabling Efficient Hardware Design for Large Language Model Inference.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

A Hardware Evaluation Framework for Large Language Model Inference.
CoRR, 2023

Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees.
CoRR, 2023

DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications.
CoRR, 2023

Using LLMs to Facilitate Formal Verification of RTL.
CoRR, 2023

Massive Data-Centric Parallelism in the Chiplet Era.
CoRR, 2023

Duet: Creating Harmony between Processors and Embedded FPGAs.
CoRR, 2023

AutoCC: Automatic Discovery of Covert Channels in Time-Shared Hardware.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Building Efficient Neural Prefetcher.
Proceedings of the International Symposium on Memory Systems, 2023

Supply Chain Aware Computer Architecture.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Seizing the Bandwidth Scaling of On-Package Interconnect in a Post-Moore's Law World.
Proceedings of the 37th International Conference on Supercomputing, 2023

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Duet: Creating Harmony between Processors and Embedded FPGAs.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

DECADES: A 67mm<sup>2</sup>, 1.46TOPS, 55 Giga Cache-Coherent 64-bit RISC-V Instructions per second, Heterogeneous Manycore SoC with 109 Tiles including Accelerators, Intelligent Storage, and eFPGA in 12nm FinFET.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

CIFER: A 12nm, 16mm<sup>2</sup>, 22-Core SoC with a 1541 LUT6/mm<sup>2</sup> 1.92 MOPS/LUT, Fully Synthesizable, CacheCoherent, Embedded FPGA.
Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

SMAPPIC: Scalable Multi-FPGA Architecture Prototype Platform in the Cloud.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

OPDB: A Scalable and Modular Design Benchmark.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

FracDRAM: Fractional Values in Off-the-Shelf DRAM.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Tiny but mighty: designing and realizing scalable latency tolerance for manycore SoCs.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Class-Discriminative CNN Compression.
Proceedings of the 26th International Conference on Pattern Recognition, 2022

Evolving transferable neural pruning functions.
Proceedings of the GECCO '22: Genetic and Evolutionary Computation Conference, Boston, Massachusetts, USA, July 9, 2022

Evolving Transferable Pruning Functions.
CoRR, 2021

PRGA: An Open-Source FPGA Research and Prototyping Framework.
Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

AutoSVA: Democratizing Formal Verification of RTL Module Interactions.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

Burstable Instances for Clouds: Performance Modeling, Equilibrium Analysis, and Revenue Maximization.
IEEE/ACM Trans. Netw., 2020

OpenPiton at 5: A Nexus for Open and Agile Hardware Design.
IEEE Micro, 2020

Rethinking Class-Discrimination Based CNN Channel Pruning.
CoRR, 2020

Enabling Programmable Transport Protocols in High-Speed NICs.
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

HyperTRIO: Hyper-Tenant Translation of I/O Addresses.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Automated Design of FPGAs Facilitated by Cycle-Free Routing.
Proceedings of the 30th International Conference on Field-Programmable Logic and Applications, 2020

Cycle-Free FPGA Routing Graphs.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Organic-Flow: An Open-Source Organic Standard Cell Library and Process Development Kit.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

BYOC: A "Bring Your Own Core" Framework for Heterogeneous-ISA Research.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

OpenPiton: an open source hardware platform for your research.
Commun. ACM, 2019

Architectural Implications of Function-as-a-Service Computing.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

The Accelerator Wall: Limits of Chip Specialization.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

JuxtaPiton: Enabling Heterogeneous-ISA Research with RISC-V and SPARC FPGA Soft-cores.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

PiCL: A Software-Transparent, Persistent Cache Log for Nonvolatile Main Memory.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

CABLE: A CAche-Based Link Encoder for Bandwidth-Starved Manycores.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Scaling Datacenter Accelerators with Compute-Reuse Architectures.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Power and Energy Characterization of an Open Source 25-Core Manycore Processor.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Acoustic Denial of Service Attacks on Hard Disk Drives.
Proceedings of the 2018 Workshop on Attacks and Solutions in Hardware Security, 2018

Piton: A Manycore Processor for Multitenant Clouds.
IEEE Micro, 2017

Acoustic Denial of Service Attacks on HDDs.
CoRR, 2017

Architectural tradeoffs for biodegradable computing.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Camouflage: Memory Traffic Shaping to Mitigate Timing Attacks.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Towards Deploying Decommissioned Mobile Devices as Cheap Energy-Efficient Compute Nodes.
Proceedings of the 9th USENIX Workshop on Hot Topics in Cloud Computing, 2017

Incentivizing self-capping to increase cloud utilization.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

MITTS: Memory Inter-arrival Time Traffic Shaping.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

CASH: Supporting IaaS Customers with a Sub-core Configurable Architecture.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Piton: A 25-core academic manycore research processor.
Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS), 2016

Availability Knob: Flexible User-Defined Availability in the Cloud.
Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016

OpenPiton: An Open Source Manycore Research Framework.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

MORC: a manycore-oriented compressed cache.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Coherence domain restriction on large scale systems.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Execution Drafting: Energy Efficiency through Computation Deduplication.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

PriME: A parallel and distributed simulator for thousand-core chips.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

The sharing architecture: sub-core configurability for IaaS clouds.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Distributed data structure for factored operating systems.
PhD thesis, 2012

Configurable fine-grain protection for multicore processor virtualization.
Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

The case for elastic operating system services in fos.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

Remote Store Programming.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

An operating system for multicore and clouds: mechanisms and implementation.
Proceedings of the 1st ACM Symposium on Cloud Computing, 2010


Factored operating systems (fos): the case for a scalable operating system for multicores.
ACM SIGOPS Oper. Syst. Rev., 2009

TILE64 - Processor: A 64-Core SoC with Mesh Interconnect.
Proceedings of the 2008 IEEE International Solid-State Circuits Conference, 2008

On-Chip Interconnection Architecture of the Tile Processor.
IEEE Micro, 2007

Constructing Virtual Architectures on a Tiled Processor.
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

A Quantitative Comparison of Reconfigurable, Tiled, and Conventional Architectures on Bit-Level Computation.
Proceedings of the 12th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2004), 2004

Energy characterization of a tiled architecture processor with on-chip networks.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs.
IEEE Micro, 2002
