Doug Burger

Proc. ACM Program. Lang., 2024

2023

Microscaling Data Formats for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2023

Shared Microexponents: A Little Shifting Goes a Long Way.

[BibT_eX]

[DOI]

CoRR, 2023

With Shared Microexponents, A Little Shifting Goes a Long Way.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

2020

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Inside Project Brainwave's Cloud-Scale, Real-Time AI Processor.

[BibT_eX]

[DOI]

IEEE Micro, 2019

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave.

[BibT_eX]

[DOI]

IEEE Micro, 2018

Azure Accelerated Networking: SmartNICs in the Public Cloud.

[BibT_eX]

[DOI]

Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation, 2018

A Configurable Cloud-Scale DNN Processor for Real-Time AI.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017

Configurable Clouds.

[BibT_eX]

[DOI]

IEEE Micro, 2017

2016

A reconfigurable fabric for accelerating large-scale datacenter services.

[BibT_eX]

[DOI]

Commun. ACM, 2016

A cloud-scale acceleration architecture.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Agile Co-Design for a Reconfigurable Datacenter.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015

Neural acceleration for general-purpose approximate programs.

[BibT_eX]

[DOI]

Commun. ACM, 2015

PocketTrend: Timely Identification and Delivery of Trending Search Content to Mobile Users.

[BibT_eX]

[DOI]

Gennady Pekhimenko

Dimitrios Lymberopoulos

Oriana Riva

Karin Strauss

Proceedings of the 24th International Conference on World Wide Web, 2015

Priority-based cache allocation in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015

2014

Scaling Power and Performance viaProcessor Composability.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

What the Future Holds for Solid-State Memory.

[BibT_eX]

[DOI]

Karin Strauss

Computer, 2014

Dynamic-vector execution on a general purpose EDGE chip multiprocessor.

[BibT_eX]

[DOI]

Alexander V. Veidenbaum

Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

General-purpose code acceleration with limited-precision analog computation.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Author retrospective for a NUCA substrate for flexible CMP cache sharing.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

A Scalable Multi-engine Xpress9 Compressor with Asynchronous Data Transfer.

[BibT_eX]

[DOI]

Joo-Young Kim

Scott Hauck

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

EVX: Vector execution on low power EDGE cores.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Multicore Model from Abstract Single Core Inputs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2013

Power challenges may end the multicore era.

[BibT_eX]

[DOI]

Commun. ACM, 2013

Using managed runtime systems to tolerate holes in wearable memories.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013

How to implement effective prediction and forwarding for fusable dynamic multicore architectures.

[BibT_eX]

[DOI]

Behnam Robatmili

Dong Li

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Reconfigurable computing in the era of post-silicon scaling [panel discussion].

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2013

2012

Power Limitations and Dark Silicon Challenge the Future of Multicore.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

Dark Silicon and the End of Multicore Scaling.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Charles R. (Chuck) Moore (1961 - 2012).

[BibT_eX]

[DOI]

Mark Papermaster

IEEE Micro, 2012

Architecture support for disciplined approximate programming.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

Preventing PCM banks from seizing too much power.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Panel Statement.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Exploiting criticality to reduce bottlenecks in distributed uniprocessors.

[BibT_eX]

[DOI]

Behnam Robatmili

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Pocket cloudlets.

[BibT_eX]

[DOI]

Emmanouil Koukoumidis

Dimitrios Lymberopoulos

Karin Strauss

Jie Liu

Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011

The Good Block: Hardware/Software Design for Composable, Block-Atomic Processors.

[BibT_eX]

[DOI]

Proceedings of the 15th Workshop on Interaction between Compilers and Computer Architectures, 2011

2010

Dynamic vectorization in the E2 dynamic multicore architecture.

[BibT_eX]

[DOI]

Andrew Putnam

Aaron Smith

SIGARCH Comput. Archit. News, 2010

Phase-Change Technology and the Future of Main Memory.

[BibT_eX]

[DOI]

IEEE Micro, 2010

The Future of Architectural Simulation.

[BibT_eX]

[DOI]

IEEE Micro, 2010

Phase change memory architecture and the quest for scalability.

[BibT_eX]

[DOI]

Commun. ACM, 2010

Use ECP, not ECC, for hard failures in resistive memories.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Dynamically replicated memory: building reliable systems from nanoscale resistive memories.

[BibT_eX]

[DOI]

Engin Ipek

Jeremy Condit

Edmund B. Nightingale

Thomas Moscibroda

Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010

Evolving Compiler Heuristics to Manage Communication and Contention.

[BibT_eX]

[DOI]

Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

Using dead blocks as a virtual victim cache.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

Composable Multicore Chips.

[BibT_eX]

[DOI]

Simha Sethumadhavan

Proceedings of the Multicore Processors and Systems, 2009

Mixed-Signal Approximate Computation: A Neural Predictor Case Study.

[BibT_eX]

[DOI]

Daniel A. Jiménez

IEEE Micro, 2009

Better I/O through byte-addressable, persistent memory.

[BibT_eX]

[DOI]

Jeremy Condit

Edmund B. Nightingale

Proceedings of the 22nd ACM Symposium on Operating Systems Principles 2009, 2009

Analysis of the TRIPS prototype block predictor.

[BibT_eX]

[DOI]

Nitya Ranganathan

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

End-to-end validation of architectural power models.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Architecting phase change memory as a scalable dram alternative.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

An evaluation of the TRIPS computer system.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008

Multitasking workload scheduling on flexible core chip multiprocessors.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2008

High performance dense linear algebra on a spatially distributed processor.

[BibT_eX]

[DOI]

Jeffrey R. Diamond

Behnam Robatmili

Robert A. van de Geijn

Kazushige Goto

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Strategies for mapping dataflow blocks to distributed hardware.

[BibT_eX]

[DOI]

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency.

[BibT_eX]

[DOI]

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Low-power, high-performance analog neural branch prediction.

[BibT_eX]

[DOI]

Daniel A. Jiménez

Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2008

Counting Dependence Predictors.

[BibT_eX]

[DOI]

Franziska Roesner

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Feature selection and policy optimization for distributed instruction placement using reinforcement learning.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

A NUCA Substrate for Flexible CMP Cache Sharing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2007

Convergent Compilation Applied to Loop Unrolling.

[BibT_eX]

[DOI]

Nicholas Nethercote

Kathryn S. McKinley

Trans. High Perform. Embed. Archit. Compil., 2007

On-Chip Interconnection Networks of the TRIPS Chip.

[BibT_eX]

[DOI]

Paul Gratz

Changkyu Kim

Heather Hanson

IEEE Micro, 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network.

[BibT_eX]

[DOI]

Paul Gratz

Heather Hanson

Robert G. McDonald

Proceedings of the First International Symposium on Networks-on-Chips, 2007

Composable Lightweight Processors.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Late-binding: enabling unordered load-store queues.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

2006

Dataflow Predication.

[BibT_eX]

[DOI]

Aaron Smith

Ramadass Nagarajan

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Merging Head and Tail Duplication for Convergent Hyperblock Formation.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Critical path analysis of the TRIPS architecture.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Design and Implementation of the TRIPS Primary Memory System.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Implementation and Evaluation of On-Chip Network Architectures.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Compiling for EDGE Architectures.

[BibT_eX]

[DOI]

Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006

A spatial path scheduling algorithm for EDGE architectures.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

2004

TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Tools for computer architecture research.

[BibT_eX]

[DOI]

Anand Sivasubramaniam

SIGMETRICS Perform. Evaluation Rev., 2004

Recent extensions to the SimpleScalar tool suite.

[BibT_eX]

[DOI]

Todd M. Austin

SIGMETRICS Perform. Evaluation Rev., 2004

Scalable Hardware Memory Disambiguation for High-ILP Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Speculative Incoherent Cache Protocols.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Scaling to the End of Silicon with EDGE Architectures.

[BibT_eX]

[DOI]

Computer, 2004

Billion-Transistor Architectures: There and Back Again.

[BibT_eX]

[DOI]

Computer, 2004

Coherence decoupling: making use of incoherence.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Scalable selective re-execution for EDGE architectures.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003

Static energy reduction techniques for microprocessor caches.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2003

Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements.

[BibT_eX]

[DOI]

Deependra Talla

Lizy Kurian John

IEEE Trans. Computers, 2003

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.

[BibT_eX]

[DOI]

IEEE Micro, 2003

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches.

[BibT_eX]

[DOI]

Changkyu Kim

IEEE Micro, 2003

Universal Mechanisms for Data-Parallel Architectures.

[BibT_eX]

[DOI]

William R. Mark

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Microprocessor pipeline energy analysis.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Guided Region Prefetching: A Cooperative Hardware/Software Approach.

[BibT_eX]

[DOI]

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Exploiting Microarchitectural Redundancy For Defect Tolerance.

[BibT_eX]

[DOI]

Charles R. Moore

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Routed Inter-ALU Networks for ILP Scalability and Performance.

[BibT_eX]

[DOI]

Vincent Ajay Singh

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Architectural versus physical solutions for on-chip communication challenges.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2003

Designing Ultra-large Instruction Issue Windows.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 2003

2002

Errata on "Measuring Experimental Error in Microprocessor Simulation".

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2002

The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.

[BibT_eX]

[DOI]

M. S. Hrishikesh

Norman P. Jouppi

Keith I. Farkas

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN 2002), 2002

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.

[BibT_eX]

[DOI]

Changkyu Kim

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

Designing a Modern Memory Hierarchy with Hardware Prefetching.

[BibT_eX]

[DOI]

Wei-Fen Lin

Steven K. Reinhardt

IEEE Trans. Computers, 2001

A design space evaluation of grid processor architectures.

[BibT_eX]

[DOI]

Ramadass Nagarajan

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Measuring Experimental Error in Microprocessor Simulation.

[BibT_eX]

[DOI]

Rajagopalan Desikan

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Filtering Superfluous Prefetches Using Density Vectors.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Reducing DRAM Latencies with an Integrated Memory Hierarchy Design.

[BibT_eX]

[DOI]

Wei-Fen Lin

Steven K. Reinhardt

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Exploring the Design Space of Future CMPs.

[BibT_eX]

[DOI]

Jaehyuk Huh

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Clock rate versus IPC: the end of the road for conventional microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999

DataScalar: A memory-centric approach to computing.

[BibT_eX]

[DOI]

Stefanos Kaxiras

J. Syst. Archit., 1999

1997

The SimpleScalar tool set, version 2.0.

[BibT_eX]

[DOI]

Todd M. Austin

SIGARCH Comput. Archit. News, 1997

Limited bandwidth to affect processor design.

[BibT_eX]

[DOI]

Alain Kägi

IEEE Micro, 1997

Billion-Transistor Architectures - Guest Editors' Introduction.

[BibT_eX]

[DOI]

Computer, 1997

Changing Interaction of Compiler and Architecture.

[BibT_eX]

[DOI]

Computer, 1997

Efficient Synchronization: Let Them Eat QOLB.

[BibT_eX]

[DOI]

Alain Kägi

Proceedings of the 24th International Symposium on Computer Architecture, 1997

DataScalar Architectures.

[BibT_eX]

[DOI]

Stefanos Kaxiras

Proceedings of the 24th International Symposium on Computer Architecture, 1997

Memory Systems.

[BibT_eX]

Gurindar S. Sohi

Proceedings of the Computer Science and Engineering Handbook, 1997

1996

Paging tradeoffs in distributed-shared-memory multiprocessors.

[BibT_eX]

[DOI]

J. Supercomput., 1996

Memory Systems.

[BibT_eX]

[DOI]

ACM Comput. Surv., 1996

Memory Bandwidth Limitations of Future Microprocessors.

[BibT_eX]

[DOI]

Alain Kägi

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

1995

Accuracy vs. performance in parallel simulation of interconnection networks.

[BibT_eX]

[DOI]