Steven K. Reinhardt

Orcid: 0000-0002-2479-0030

  • Microsoft, Redmond, WA, USA
  • University of Michigan, Ann Arbor, USA (former)

According to our database1, Steven K. Reinhardt authored at least 77 papers between 1993 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.


IEEE Fellow

IEEE Fellow 2013, "For contributions to computer system design and evaluation".



In proceedings 
PhD thesis 




Parallelization Strategies for DLRM Embedding Bag Operator on AMD CPUs.
IEEE Micro, 2024

VecPAC: A Vectorizable and Precision-Aware CGRA.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Inside Project Brainwave's Cloud-Scale, Real-Time AI Processor.
IEEE Micro, 2019

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave.
IEEE Micro, 2018

Generic System Calls for GPUs.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

A Configurable Cloud-Scale DNN Processor for Real-Time AI.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

ComP-net: command processor networking for efficient intra-kernel communications on GPUs.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

If You Build It, Will They Come?
IEEE Micro, 2017

Programming GPGPU Graph Applications with Linear Algebra Building Blocks.
Int. J. Parallel Program., 2017

GPU System Calls.
CoRR, 2017

Gravel: fine-grain GPU-initiated network messages.
Proceedings of the International Conference for High Performance Computing, 2017

GPU triggered networking for intra-kernel communications.
Proceedings of the International Conference for High Performance Computing, 2017

Extended task queuing: active messages for heterogeneous systems.
Proceedings of the International Conference for High Performance Computing, 2016

Agave: A benchmark suite for exploring the complexities of the Android software stack.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Achieving Exascale Capabilities through Heterogeneous Computing.
IEEE Micro, 2015

Graph Coloring on the GPU and Some Techniques to Improve Load Imbalance.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Fine-grain task aggregation and coordination on GPUs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

BelRed: Constructing GPGPU graph applications with software building blocks.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2014

QuickRelease: A throughput-oriented approach to release consistency on GPUs.
Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

Heterogeneous-race-free memory models.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

Heterogeneous system coherence for integrated CPU-GPU systems.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Pannotia: Understanding irregular GPGPU graph applications.
Proceedings of the IEEE International Symposium on Workload Characterization, 2013

Massively Multithreaded Computing Systems.
Computer, 2012

The gem5 simulator.
SIGARCH Comput. Archit. News, 2011

Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Server Designs for Warehouse-Computing Environments.
IEEE Micro, 2009

End-to-end performance forecasting: finding bottlenecks before they happen.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Disaggregated memory for expansion and sharing in blade servers.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

PicoServer: Using 3D stacking technology to build energy efficient servers.
ACM J. Emerg. Technol. Comput. Syst., 2008

Full-System Critical Path Analysis.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

QoS policies and architecture for cache/memory in CMP platforms.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

Analysis of hardware prefetching across virtual page boundaries.
Proceedings of the 4th Conference on Computing Frontiers, 2007

The M5 Simulator: Modeling Networked Systems.
IEEE Micro, 2006

PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Integrated network interfaces for high-bandwidth TCP/IP.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

Exploring the cache design space for large scale CMPs.
SIGARCH Comput. Archit. News, 2005

ExtraVirt: detecting and recovering from transient processor faults.
Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, 2005

How to Fake 1000 Registers.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005

The Soft Error Problem: An Architectural Perspective.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

A Unified Compressed Memory Hierarchy.
Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Performance Analysis of System Overheads in TCP/IP Workloads.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Reducing the Soft-Error Rate of a High-Performance Microprocessor.
IEEE Micro, 2004

A compressed memory hierarchy using an indirect index cache.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Cache Scrubbing in Microprocessors: Myth or Necessity?
Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2004), 2004

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Measuring Architectural Vulnerability Factors.
IEEE Micro, 2003

A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Guided Region Prefetching: A Cooperative Hardware/Software Approach.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

The Impact of Resource Partitioning on SMT Processors.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

A Scalable Instruction Queue Design Using Dependence Chains.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Detailed Design and Evaluation of Redundant Multithreading Alternatives.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Designing a Modern Memory Hierarchy with Hardware Prefetching.
IEEE Trans. Computers, 2001

Automatic performance setting for dynamic voltage scaling.
Proceedings of the MOBICOM 2001, 2001

Filtering Superfluous Prefetches Using Density Vectors.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Reducing DRAM Latencies with an Integrated Memory Hierarchy Design.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator.
IEEE Concurr., 2000

Integrating hardware and software concepts in a microprocessor-based system design lab.
Proceedings of the 2000 workshop on Computer architecture education, 2000

Transient fault detection via simultaneous multithreading.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

A fully associative software-managed cache design.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Thread Level Parallelism and Interactive Performance of Desktop Applications.
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

Simultaneous Subordinate Microthreading (SSMT).
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Hardware Support for Flexible Distributed Shared Memory.
IEEE Trans. Computers, 1998

Computer architecture instruction at the University of Michigan.
Proceedings of the 1998 workshop on Computer architecture education, 1998

Tempest and Typhoon: User-Level Shared Memory.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: Tempest and Typhoon: User-Level Shared Memory.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Decoupled Hardware Support for Distributed Shared Memory.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Application-specific protocols for user-level shared memory.
Proceedings of the Proceedings Supercomputing '94, 1994

Fine-grain Access Control for Distributed Shared Memory.
Proceedings of the ASPLOS-VI Proceedings, 1994

Cooperative Shared Memory: Software and Hardware Support for Scalable Multiprocesors.
ACM Trans. Comput. Syst., 1993

The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel Computers.
Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1993

Kernel Support for the Wisconsin Wind Tunnel.
Proceedings of the USENIX Microkernels and Other Kernel Architectures Symposium, 1993

Mechanisms for Cooperative Shared Memory.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
