Gurindar S. Sohi

  • University of Wisconsin-Madison, Madison, WI, USA

According to our database1, Gurindar S. Sohi authored at least 110 papers between 1985 and 2024.

Collaborative distances:


ACM Fellow

ACM Fellow 2003, "For contributions to computer architecture.".

IEEE Fellow

IEEE Fellow 2004, "For contributions to thread-level speculation in computer architecture.".



In proceedings 
PhD thesis 


Online presence:



A Non-Traditional Approach to Assisting Data Address Translation.
CoRR, 2024

Instruction Block Movement with Coupled High-Level Program Sequencing.
CoRR, 2024

Special Issue on Hot Chips 33.
IEEE Micro, 2022

Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache Accesses.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Filtering Translation Bandwidth with Virtual Caching.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

ATM: Approximate Task Memoization in the Runtime System.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Revisiting virtual L1 caches: A practical design using dynamic synonym remapping.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Hot Chips 26 [Guest editors' introduction].
IEEE Micro, 2015

Adaptive, efficient, parallel execution of parallel programs.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Globally precise-restartable execution of parallel programs.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Author retrospective for cooperative cache partitioning for chip multiprocessors.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Welcome program chairs.
Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), 2014

Holistic run-time parallelism management for time and energy efficiency.
Proceedings of the International Conference on Supercomputing, 2013

Supporting Overcommitted Virtual Machines through Hardware Spin Detection.
IEEE Trans. Parallel Distributed Syst., 2012

Efficient, precise-restartable program execution on future multicores.
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

Dataflow execution of sequential imperative programs on multicore architectures.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Speculatively Multithreaded Architectures.
Proceedings of the Multicore Processors and Systems, 2009

Dynamic heterogeneity and the need for multicore virtualization.
ACM SIGOPS Oper. Syst. Rev., 2009

Serialization sets: a dynamic dependence-based parallel execution model.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Mixed-mode multicore reliability.
Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

Serializing instructions in system-intensive workloads: Amdahl's Law strikes again.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

Adapting to intermittent faults in multicore systems.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

Cooperative cache partitioning for chip multiprocessors.
Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Adapting to Intermittent Faults in Future Multicore Systems.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Cooperative Caching for Chip Multiprocessors.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs.
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006

Computation spreading: employing hardware migration to specialize CMP cores on-the-fly.
Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006

Hardware support for spin management in overcommitted virtual machines.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

Speculative Incoherent Cache Protocols.
IEEE Micro, 2004

Characterization of Problem Stores.
IEEE Comput. Archit. Lett., 2004

Single-Chip Multiprocessors: The Next Wave of Computer Architecture Innovation.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Use-Based Register Caching with Decoupled Indexing.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

Coherence decoupling: making use of incoherence.
Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Exploiting Value Locality in Physical Register Files.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Parallelism in the Front-End.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Reducing Memory Latency via Read-after-Read Memory Dependence Prediction.
IEEE Trans. Computers, 2002

Master/slave speculative parallelization.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

A quantitative framework for automated pre-execution thread selection.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Characterizing and predicting value degree of use.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Out-of-Order Instruction Fetch Using Multiple Sequencers.
Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

Dynamic dead-instruction detection and elimination.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

Speculative Versioning Cache.
IEEE Trans. Parallel Distributed Syst., 2001

Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling.
Proc. IEEE, 2001

Squash Reuse via a Simplified Implementation of Register Integration.
J. Instr. Level Parallelism, 2001

Speculative Multithreaded Processors.
Computer, 2001

Execution-based prediction using speculative slices.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

A Programmable Co-Processor for Profiling.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Speculative Data-Driven Multithreading.
Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Microprocessors - 10 Years Back, 10 Years Ahead.
Proceedings of the Informatics - 10 Years Back. 10 Years Ahead., 2001

Memory Dependence Prediction in Multimedia Applications.
J. Instr. Level Parallelism, 2000

Register integration: a simple and efficient implementation of squash reuse.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

A static power model for architects.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Understanding the backward slices of performance degrading instructions.
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

Memory Dependence Speculation Tradeoffs in Centralized, Continuous-Window Superscalar Processors.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Amir Roth: Speculative Multithreaded Processors.
Proceedings of the High Performance Computing, 2000

Task Selection for the Multiscalar Architecture.
J. Parallel Distributed Comput., 1999

Speculative Memory Cloaking and Bypassing.
Int. J. Parallel Program., 1999

The Use of Multithreading for Exception Handling.
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Read-After-Read Memory Dependence Prediction.
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Effective Jump-Pointer Prefetching for Linked Data Structures.
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999

Improving virtual function call target prediction via dependence-based pre-computation.
Proceedings of the 13th international conference on Supercomputing, 1999

Task Selection for a Multiscalar Processor.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Understanding the Differences Between Value Prediction and Instruction Reuse.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: Multiscalar Processors.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: Instruction Issue Logic for High-Performance, Interruptable Pipelined Processors.
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

An Empirical Analysis of Instruction Repetition.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

Dependance Based Prefetching for Linked Data Structures.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

Streamlining Inter-Operation Memory Communication via Data Dependence Prediction.
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Dynamic Instruction Reuse.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Dynamic Speculation and Synchronization of Data Dependences.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

Memory Systems.
Proceedings of the Computer Science and Engineering Handbook, 1997

ARB: A Hardware Mechanism for Dynamic Reordering of Memory References.
IEEE Trans. Computers, 1996

High-Bandwidth Address Translation for Multiple-Issue Processors.
Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

The microarchitecture of superscalar processors.
Proc. IEEE, 1995

Zero-cycle loads: microarchitecture support for reducing load latency.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Multiscalar Processors.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Streamlining Data Cache Access with Fast Address Calculation.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Request Combining in Multiprocessors with Arbitrary Interconnection Networks.
IEEE Trans. Parallel Distributed Syst., 1994

Efficient Detection of All Pointer and Array Access Errors.
Proceedings of the ACM SIGPLAN'94 Conference on Programming Language Design and Implementation (PLDI), 1994

The anatomy of the register file in a multiscalar processor.
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994

Guarded Executing and Branch Prediction in Dynamic ILP Processors.
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

High-Bandwidth Interleaved Memories for Vector Processors-A Simulation Study.
IEEE Trans. Computers, 1993

Control flow prediction for dynamic ILP processors.
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Evaluating Design Choices for Shared Bus Multiprocessors in a Throughput-Oriented Environment.
IEEE Trans. Computers, 1992

Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors.
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Dynamic Dependency Analysis of Ordinary Programs.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Experience with Mean Value Analysis Models for Evaluating Shared Bus, Throughput-Oriented Multiprocessors.
Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1991

An Empirical Study of the CRAY Y-MP Processor Using the Perfect Club Benchmarks.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

High-Bandwidth Data Memory Systems for Superscalar Processors.
Proceedings of the ASPLOS-IV Proceedings, 1991

The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control.
IEEE Trans. Parallel Distributed Syst., 1990

The use of intermediate memories for low-latency memory access in supercomputer scalar units.
J. Supercomput., 1990

Instruction Issue Logic for High-Performance Interruptible, Multiple Functional Unit, Pipelines Computers.
IEEE Trans. Computers, 1990

Design and Analysis of a Gracefully Degrading Interleaved Memory System.
IEEE Trans. Computers, 1990

Scalable Shared-Memory Multiprocessor Architectures.
Computer, 1990

Exploitation of operation-level parallelism in a processor of the CRAY X-MP.
Proceedings of the 1990 IEEE International Conference on Computer Design: VLSI in Computers and Processors, 1990

Cache Memory Organization to Enhance the Yield of High-Performance VLSI Processors.
IEEE Trans. Computers, 1989

Performance Analysis of Hierarchical Cache-Consistent Multiprocessors.
Perform. Evaluation, 1989

On the Adequacy of Direct Mapped Caches for Lisp and Prolog Data Reference Patterns.
Proceedings of the Logic Programming, 1989

Using Feedback to Control Tree Saturation in Multistage Interconnection Networks.
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

Restricted Fetch&Phi operations for parallel processing.
Proceedings of the 3rd international conference on Supercomputing, 1989

A study of time-redundant fault tolerance techniques for high-performance pipelined computers.
Proceedings of the Nineteenth International Symposium on Fault-Tolerant Computing, 1989

Tradeoffs in Instruction Format Design for Horizontal Architectures.
Proceedings of the ASPLOS-III Proceedings, 1989

Multiple instruction issue and single-chip processors.
Proceedings of the 21st Annual Workshop and Symposium on Microprogramming and Microarchitecture, 1988, San Diego, California, USA, November 28, 1988

The Performance Potential of Multiple Functional Unit Processors.
Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

Organization and Analysis of a Gracefully-Degrading Interleaved Memory System.
Proceedings of the 14th Annual International Symposium on Computer Architecture. Pittsburgh, 1987

Features of the Structured Memory Access (SMA) Architecture.
Proceedings of the Spring COMPCON'86, 1986

Blast: A Machine Architecture for High-Speed List Processing Using Associative Tables (Traversal, Pointers)
PhD thesis, 1985

An Efficient LISP-Execution Architecture with a New Representation for List Structures.
Proceedings of the 12th Annual Symposium on Computer Architecture, 1985
