David A. Wood

  • University of Wisconsin-Madison, Madison, USA

According to our database1, David A. Wood authored at least 121 papers between 1984 and 2021.

Collaborative distances:


ACM Fellow

ACM Fellow 2005, "For contributions to shared-memory multiprocessing.".



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Byte-Select Compression.
ACM Trans. Archit. Code Optim., 2021

A Primer on Memory Consistency and Cache Coherence, Second Edition
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01764-3, 2020

The gem5 Simulator: Version 20.0+.
CoRR, 2020

Independent Forward Progress of Work-groups.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Pareto Governors for Energy-Optimal Computing.
ACM Trans. Archit. Code Optim., 2017

Could Compression Be of General Use? Evaluating Memory Compression across Domains.
ACM Trans. Archit. Code Optim., 2017

Energy-Proportional Computing: A New Definition.
Computer, 2017

Gravel: fine-grain GPU-initiated network messages.
Proceedings of the International Conference for High Performance Computing, 2017

LogCA: A High-Level Performance Model for Hardware Accelerators.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Crossing Guard: Mediating Host-Accelerator Coherence Interactions.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

Optimization Models for Three On-Chip Network Problems.
ACM Trans. Archit. Code Optim., 2016

Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache.
ACM Trans. Archit. Code Optim., 2016

When to use 3D Die-Stacked Memory for Bandwidth-Constrained Big Data Workloads.
CoRR, 2016

21st Century Computer Architecture.
CoRR, 2016

GPGPU Footprint Models to Estimate per-Core Power.
IEEE Comput. Archit. Lett., 2016

Lazy release consistency for GPUs.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

A Primer on Compression in the Memory Hierarchy
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01751-3, 2015

Implications of Emerging 3D GPU Architecture on the Scan Primitive.
SIGMOD Rec., 2015

LogCA: A Performance Model for Hardware Accelerators.
IEEE Comput. Archit. Lett., 2015

gem5-gpu: A Heterogeneous CPU-GPU Simulator.
IEEE Comput. Archit. Lett., 2015

Border control: sandboxing accelerators.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries.
Proceedings of the 11th International Workshop on Data Management on New Hardware, 2015

Synchronization Using Remote-Scope Promotion.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization.
IEEE Micro, 2014

Skewed Compressed Caches.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Fine-grain task aggregation and coordination on GPUs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Supporting x86-64 address translation for 100s of GPU lanes.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

QuickRelease: A throughput-oriented approach to release consistency on GPUs.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Resolved: specialized architectures, languages, and system software should supplant general-purpose alternatives within a decade.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Heterogeneous-race-free memory models.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Optimization and Mathematical Modeling in Computer Architecture
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01773-5, 2013

Reuse-based online models for caches.
Proceedings of the ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems, 2013

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Heterogeneous system coherence for integrated CPU-GPU systems.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Something old and something new: P-states can borrow microarchitecture techniques too.
Proceedings of the International Symposium on Low Power Electronics and Design, 2012

UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique.
Proceedings of the International Conference on Supercomputing, 2012

A Primer on Memory Consistency and Cache Coherence
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01733-9, 2011

The gem5 simulator.
SIGARCH Comput. Archit. News, 2011

Calvin: Deterministic or not? Free will to choose.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Safe and efficient supervised memory systems.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

WiDGET: Wisconsin decoupled grid execution tiles.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Forwardflow: a scalable core for power-constrained CMPs.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

StealthTest: Low Overhead Online Software Testing Using Transactional Memory.
Proceedings of the PACT 2009, 2009

Performance Pathologies in Hardware Transactional Memory.
IEEE Micro, 2008

TokenTM: Efficient Execution of Large Transactions with Hardware Transactional Memory.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

LogTM-SE: Decoupling Hardware Transactional Memory from Caches.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Interactions Between Compression and Prefetching in Chip Multiprocessors.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

A Case for Deconstructing Hardware Transactional Memory Systems.
Proceedings of the Programming Models for Ubiquitous Parallelism, 02.09. - 07.09.2007, 2007

IPC Considered Harmful for Multiprocessor Workloads.
IEEE Micro, 2006

ASR: Adaptive Selective Replication for CMP Caches.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

LogTM: log-based transactional memory.
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Supporting nested transactional memory in logTM.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Keynote talk challenges in chip multiprocessor memory systems.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset.
SIGARCH Comput. Archit. News, 2005

Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs.
J. Parallel Distributed Comput., 2005

Exploring Processor Design Options for Java-Based Middleware.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Improving Multiple-CMP Systems Using Token Coherence.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Managing Wire Delay in Large Chip-Multiprocessor Caches.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Adaptive Cache Compression for High-Performance Processors.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Using Speculation to Simplify Multiprocessor Design.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Token Coherence: A New Framework for Shared-Memory Multiprocessors.
IEEE Micro, 2003

Addressing Workload Variability in Architectural Simulations.
IEEE Micro, 2003

Simulating a $2M Commercial Server on a $2K PC.
Computer, 2003

TLC: Transmission Line Caches.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Token Coherence: Decoupling Performance and Correctness.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Memory System Behavior of Java-Based Middleware.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Variability in Architectural Simulations of Multi-Threaded Workloads.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Dynamic Verification of End-to-End Multiprocessor Invariants.
Proceedings of the 2003 International Conference on Dependable Systems and Networks (DSN 2003), 2003

Adaptive competitive self-organizing associative memory.
IEEE Trans. Syst. Man Cybern. Part A, 2002

Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol.
IEEE Trans. Parallel Distributed Syst., 2002

Full-system timing-first simulation.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2002

SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Bandwidth Adaptive Snooping.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.
IEEE Concurr., 2000

Timestamp snooping: an approach for extending SMPs.
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

DBMSs on a Modern Processor: Where Does Time Go?
Proceedings of the VLDB'99, 1999

Multicast Snooping: A New Coherence Method Using a Multicast Address Network.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Parallel Dispatch Queue: A Queue-Based Programming Abstraction to Parallelize Fine-Grain Communication Protocols.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Hardware Support for Flexible Distributed Shared Memory.
IEEE Trans. Computers, 1998

Analytic Evaluation of Shared-memory Systems with ILP Processors.
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

Tempest and Typhoon: User-Level Shared Memory.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: Tempest and Typhoon: User-Level Shared Memory.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Sirocco: Cost-Effective Fine-Grain Distributed Shared Memory.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

Active Memory: A New Abstraction for Memory System Simulation.
ACM Trans. Model. Comput. Simul., 1997

Modeling Cost/Performance of a Parallel Computer Simulator.
ACM Trans. Model. Comput. Simul., 1997

Relaxed Consistency and Coherence Granularity in DSM Systems: A Performance Evaluation.
Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1997

Reactive NUMA: A Design for Unifying S-COMA and CC-NUMA.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Scheduling Communication on a SMP Node Parallel Machine.
Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Paging tradeoffs in distributed-shared-memory multiprocessors.
J. Supercomput., 1996

Problems, Challenges and the Importance of Performance Evaluation.
ACM Comput. Surv., 1996

Decoupled Hardware Support for Distributed Shared Memory.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Coherent Network Interfaces for Fine-Grain Communication.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Synchronization Hardware for Networks of Workstations: Performance vs. Cost.
Proceedings of the 10th international conference on Supercomputing, 1996

The Tempest approach to distributed shared memory.
Proceedings of the 1996 International Conference on Computer Design (ICCD '96), 1996

Cost-Effective Parallel Computing.
Computer, 1995

Where Is Software Headed? A Virtual Roundtable.
Computer, 1995

Dynamic Self-Invalidation: Reducing Coherence Overhead in Shared-Memory Multiprocessors.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Accuracy vs. performance in parallel simulation of interconnection networks.
Proceedings of IPPS '95, 1995

Tempest: A Substrate for Portable Parallel Programs.
Proceedings of the COMPCON '95: Technologies for the Information Superhighway, 1995

A Comparison of Trace-Sampling Techniques for Multi-Megabyte Caches.
IEEE Trans. Computers, 1994

The Wisconsin Wind Tunnel project: an annotated bibliography.
SIGARCH Comput. Archit. News, 1994

Cache Profiling and the SPEC Benchmarks: A Case Study.
Computer, 1994

Application-specific protocols for user-level shared memory.
Proceedings of the Proceedings Supercomputing '94, 1994

Cost/performance of a parallel computer simulator.
Proceedings of the Eighth Workshop on Parallel and Distributed Simulation, 1994

Fine-grain Access Control for Distributed Shared Memory.
Proceedings of the ASPLOS-VI Proceedings, 1994

Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocesors.
ACM Trans. Comput. Syst., 1993

Wisconsin Architectural Research Tool Set.
SIGARCH Comput. Archit. News, 1993

The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers.
Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1993

Kernel Support for the Wisconsin Wind Tunnel.
Proceedings of the USENIX Microkernels and Other Kernel Architectures Symposium, 1993

Mechanisms for Cooperative Shared Memory.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

A Model for Estimating Trace-Sample Miss Ratios.
Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1991

Implementing Stack Simulation for Highly-Associative Memories.
Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1991

Verifying a Multiprocessor Cache Controller Using Random Test Generation.
IEEE Des. Test Comput., 1990

A VLSI chip set for a multiprocessor workstation. II. A memory management unit and cache controller.
IEEE J. Solid State Circuits, December, 1989

Supporting Reference and Dirty Bits in SPUR's Virtual Address Cache.
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

An In-Cache Address Translation Mechanism.
Proceedings of the 13th Annual Symposium on Computer Architecture, Tokyo, Japan, June 1986, 1986

Implementing A Cache Consistency Protocol.
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985

Implementation Techniques for Main Memory Database Systems.
Proceedings of the SIGMOD'84, 1984
