Stephen W. Keckler

Abhimanyu Rajeshkumar Bambhaniya

Gururaj Saileshwar

CoRR, 2024

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition.

[BibT_eX]

[DOI]

Geonhwa Jeong

Po-An Tsai

Tushar Krishna

CoRR, 2024

Vision Transformer Computation and Resilience for Dynamic Inference.

[BibT_eX]

[DOI]

Kavya Sreedhar

Mark Horowitz

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

PrIDE: Achieving Secure Rowhammer Mitigation with Low-Cost In-DRAM Trackers.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization.

[BibT_eX]

[DOI]

Neal Clayton Crago

Sana Damani

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

2023

Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2023

cuCatch: A Debugging Tool for Efficiently Catching Memory Safety Violations in CUDA Applications.

[BibT_eX]

[DOI]

Mohamed Tarek Ibn Ziad

Proc. ACM Program. Lang., 2023

Augmenting Legacy Networks for Flexible Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intelligent Vehicles Symposium, 2023

Community-based Matrix Reordering for Sparse Linear Algebra Optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC.

[BibT_eX]

[DOI]

Mohamed Tarek Ibn Ziad

Aamer Jaleel

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

VaPr: Variable-Precision Tensors to Accelerate Robot Motion Planning.

[BibT_eX]

[DOI]

Yu-Shun Hsiao

Balakumar Sundaralingam

IROS, 2023

2022

Making Convolutions Resilient Via Algorithm-Based Error Detection Techniques.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2022

GPU Domain Specialization via Composable On-Package Architecture.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Characterizing and Mitigating Soft Errors in GPU DRAM.

[BibT_eX]

[DOI]

IEEE Micro, 2022

Enabling and Accelerating Dynamic Vision Transformer Inference for Real-Time Applications.

[BibT_eX]

[DOI]

Kavya Sreedhar

Mark Horowitz

CoRR, 2022

Accelerators.

[BibT_eX]

[DOI]

Steve Keckler

Dejan S. Milojicic

Computer, 2022

Saving PAM4 Bus Energy with SMOREs: Sparse Multi-level Opportunistic Restricted Encodings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

GPU Subwarp Interleaving.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Exploiting Temporal Data Diversity for Detecting Safety-critical Faults in AV Compute Systems.

[BibT_eX]

[DOI]

Saurabh Jha

Shengkun Cui

Zbigniew T. Kalbarczyk

Ravishankar K. Iyer

Proceedings of the 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2022

Zhuyi: perception processing rate estimation for safety in autonomous vehicles.

[BibT_eX]

[DOI]

Yu-Shun Hsiao

Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021

Evolution of the Graphics Processing Unit (GPU).

[BibT_eX]

[DOI]

David Blair Kirk

IEEE Micro, 2021

SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Cooperative Profile Guided Optimizations.

[BibT_eX]

[DOI]

Mark Stephenson

Ram Rangan

Comput. Graph. Forum, 2021

Simba: scaling deep-learning inference with chiplet-based architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Commun. ACM, 2021

Generating and Characterizing Scenarios for Safety Testing of Autonomous Vehicles.

[BibT_eX]

[DOI]

Zahra Ghodsi

Iuri Frosio

Alejandro J. Troccoli

Siddharth Garg

Anima Anandkumar

Proceedings of the IEEE Intelligent Vehicles Symposium, 2021

Suraksha: A Framework to Analyze the Safety Implications of Perception Design Choices in AVs.

[BibT_eX]

[DOI]

Hengyu Zhao

Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering, 2021

Optimizing Selective Protection for CNN Resilience.

[BibT_eX]

[DOI]

Abdulrahman Mahmoud

Christopher W. Fletcher

Proceedings of the 32nd IEEE International Symposium on Software Reliability Engineering, 2021

Suraksha: A Quantitative AV Safety Evaluation Framework to Analyze Safety Implications of Perception Design Choices.

[BibT_eX]

[DOI]

Hengyu Zhao

Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2021

NVBitFI: Dynamic Fault Injection for GPUs.

[BibT_eX]

[DOI]

Oreste Villa

Proceedings of the 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2021

2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.

[BibT_eX]

[DOI]

Brian Zimmer

IEEE J. Solid State Circuits, 2020

Estimating Silent Data Corruption Rates Using a Two-Level Model.

[BibT_eX]

[DOI]

CoRR, 2020

HarDNN: Feature Map Vulnerability Evaluation in CNNs.

[BibT_eX]

[DOI]

Abdulrahman Mahmoud

Christopher W. Fletcher

CoRR, 2020

Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Speculative reconvergence for improved SIMT efficiency.

[BibT_eX]

[DOI]

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs.

[BibT_eX]

[DOI]

Neal Clayton Crago

Mark Stephenson

ACM Trans. Archit. Code Optim., 2019

Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors.

[BibT_eX]

[DOI]

Saurabh Jha

CoRR, 2019

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.

[BibT_eX]

[DOI]

Brian Zimmer

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SNAP: A 1.67 - 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.

[BibT_eX]

[DOI]

Brucek Khailany

Joel S. Emer

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

GPU snapshot: checkpoint offloading for GPU-dense systems.

[BibT_eX]

[DOI]

Kyushick Lee

Mattan Erez

Proceedings of the ACM International Conference on Supercomputing, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

On the Trend of Resilience for GPU-Dense Systems.

[BibT_eX]

[DOI]

Kyushick Lee

Mattan Erez

Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection.

[BibT_eX]

[DOI]

Saurabh Jha

Subho S. Banerjee

Zbigniew T. Kalbarczyk

Ravishankar K. Iyer

Proceedings of the 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.

[BibT_eX]

[DOI]

Christopher W. Fletcher

Joel S. Emer

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Software-Directed Techniques for Improved GPU Register File Utilization.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training.

[BibT_eX]

[DOI]

CoRR, 2018

Optimizing software-directed instruction replication for GPU error detection.

[BibT_eX]

[DOI]

Abdulrahman Mahmoud

Proceedings of the International Conference for High Performance Computing, 2018

SwapCodes: Error Codes for Hardware-Software Cooperative GPU Pipeline Error Detection.

[BibT_eX]

[DOI]

Brian Zimmer

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2017

Understanding error propagation in deep learning neural network (DNN) accelerators and applications.

[BibT_eX]

[DOI]

Guanpeng Li

Proceedings of the International Conference for High Performance Computing, 2017

Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Architecting an Energy-Efficient DRAM System for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

Virtualizing Deep Neural Networks for Memory-Efficient Neural Network Design.

[BibT_eX]

[DOI]

CoRR, 2016

vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

A patch memory system for image processing and computer vision.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

CLARA: Circular Linked-List Auto and Self Refresh Architecture.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Towards high performance paged memory for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

A case for toggle-aware compression for GPU systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Selective GPU caches to eliminate CPU-GPU HW cache coherence.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

A real-time energy-efficient superpixel hardware accelerator for mobile computer vision applications.

[BibT_eX]

[DOI]

Injoon Hong

Iuri Frosio

Brucek Khailany

Proceedings of the 53rd Annual Design Automation Conference, 2016

2015

Designing Efficient Heterogeneous Memory Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Increasing Interconnection Network Throughput with Virtual Channels.

[BibT_eX]

[DOI]

Computer, 2015

Toggle-Aware Compression for GPUs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Anatomy of GPU Memory System for Multi-Application Execution.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Flexible software profiling of GPU architectures.

[BibT_eX]

[DOI]

Mark Stephenson

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

A variable warp size architecture.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

GPU Computing Pipeline Inefficiencies and Optimization Opportunities in Heterogeneous CPU-GPU Processors.

[BibT_eX]

[DOI]

Joel Hestness

David A. Wood

Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Unlocking bandwidth for GPUs in CC-NUMA systems.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Page Placement Strategies for GPUs within Heterogeneous Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014

Scaling Power and Performance viaProcessor Composability.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

2014 International Symposium on Computer Architecture Influential Paper Award; 2014 Maurice Wilkes Award Given to Ravi Rajwar.

[BibT_eX]

[DOI]

Dean M. Tullsen

IEEE Micro, 2014

Rethinking caches for throughput processors: technical perspective.

[BibT_eX]

[DOI]

Commun. ACM, 2014

Scaling the Power Wall: A Path to Exascale.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Arbitrary Modulus Indexing.

[BibT_eX]

[DOI]

Jeffrey R. Diamond

Donald S. Fussell

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

A comparative analysis of microarchitecture effects on CPU and GPU memory system behavior.

[BibT_eX]

[DOI]

Joel Hestness

David A. Wood

Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Author retrospective for a NUCA substrate for flexible CMP cache sharing.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications.

[BibT_eX]

[DOI]

Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

2013

How to implement effective prediction and forwarding for fusable dynamic multicore architectures.

[BibT_eX]

[DOI]

Behnam Robatmili

Dong Li

Hadi Esmaeilzadeh

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

21st century digital design tools.

[BibT_eX]

[DOI]

Chris Malachowsky

Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Convergence and scalarization for data-parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2012

A QoS-Enabled On-Die Interconnect Fabric for Kilo-Node Chips.

[BibT_eX]

[DOI]

IEEE Micro, 2012

Charles R. (Chuck) Moore (1961 - 2012).

[BibT_eX]

[DOI]

Mark Papermaster

IEEE Micro, 2012

Massively Multithreaded Computing Systems.

[BibT_eX]

[DOI]

Steven K. Reinhardt

Computer, 2012

Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

2011

GPUs and the Future of Parallel Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2011

A compile-time managed multi-level register file hierarchy.

[BibT_eX]

[DOI]

Mark Gebhart

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Evaluation and optimization of multicore performance bottlenecks in supercomputing applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Energy-efficient mechanisms for managing thread context in throughput processors.

[BibT_eX]

[DOI]

Proceedings of the 38th International Symposium on Computer Architecture (ISCA 2011), 2011

Exploiting criticality to reduce bottlenecks in distributed uniprocessors.

[BibT_eX]

[DOI]

Behnam Robatmili

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

2010

Netrace: dependency-driven trace-based network-on-chip simulation.

[BibT_eX]

[DOI]

Joel Hestness

Proceedings of the Third International Workshop on Network on Chip Architectures, 2010

Topology-Aware Quality-of-Service Support in Highly Integrated Chip Multiprocessors.

[BibT_eX]

[DOI]

Onur Mutlu

Proceedings of the Computer Architecture, 2010

2009

On-Chip Networks for Multicore Systems.

[BibT_eX]

[DOI]

Li-Shiuan Peh

Sriram R. Vangal

Proceedings of the Multicore Processors and Systems, 2009

Composable Multicore Chips.

[BibT_eX]

[DOI]

Simha Sethumadhavan

Proceedings of the Multicore Processors and Systems, 2009

Segment gating for static energy reduction in Networks-on-Chip.

[BibT_eX]

[DOI]

Kyle C. Hale

Proceedings of the Second International Workshop on Network on Chip Architectures, 2009

Preemptive virtual clock: a flexible, efficient, and cost-effective QOS scheme for networks-on-chip.

[BibT_eX]

[DOI]

Onur Mutlu

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Analysis of the TRIPS prototype block predictor.

[BibT_eX]

[DOI]

Nitya Ranganathan

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

End-to-end validation of architectural power models.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009

Express Cube Topologies for on-Chip Interconnects.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

An evaluation of the TRIPS computer system.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, 2009

2008

Multitasking workload scheduling on flexible core chip multiprocessors.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2008

High performance dense linear algebra on a spatially distributed processor.

[BibT_eX]

[DOI]

Jeffrey R. Diamond

Behnam Robatmili

Robert A. van de Geijn

Kazushige Goto

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Counting Dependence Predictors.

[BibT_eX]

[DOI]

Franziska Roesner

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Regional congestion awareness for load balance in networks-on-chip.

[BibT_eX]

[DOI]

Paul Gratz

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

2007

A NUCA Substrate for Flexible CMP Cache Sharing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2007

Research Challenges for On-Chip Interconnection Networks.

[BibT_eX]

[DOI]

John D. Owens

Doddaballapur Narasimha-Murthy Jayasimha

Ron Ho

Li-Shiuan Peh

IEEE Micro, 2007

On-Chip Interconnection Networks of the TRIPS Chip.

[BibT_eX]

[DOI]

Paul Gratz

Changkyu Kim

Heather Hanson

IEEE Micro, 2007

Reconciling performance and programmability in networking systems.

[BibT_eX]

[DOI]

Jayaram Mudigonda

Harrick M. Vin

Proceedings of the ACM SIGCOMM 2007 Conference on Applications, 2007

Implementation and Evaluation of a Dynamically Routed Processor Operand Network.

[BibT_eX]

[DOI]

Paul Gratz

Heather Hanson

Robert G. McDonald

Proceedings of the First International Symposium on Networks-on-Chips, 2007

Composable Lightweight Processors.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Thermal response to DVFS: analysis with an Intel Pentium M.

[BibT_eX]

[DOI]

Freeman L. Rawson III

Juan Rubio

Proceedings of the 2007 International Symposium on Low Power Electronics and Design, 2007

Late-binding: enabling unordered load-store queues.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Power, Performance, and Thermal Management for High-Performance Systems.

[BibT_eX]

[DOI]

Freeman L. Rawson III

Juan Rubio

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

The future of multi-core technologies.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006

Dataflow Predication.

[BibT_eX]

[DOI]

Aaron Smith

Ramadass Nagarajan

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Distributed Microarchitectural Protocols in the TRIPS Prototype Processor.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Decomposing memory performance: data structures and phases.

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Memory Management, 2006

Critical path analysis of the TRIPS architecture.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Design and Implementation of the TRIPS Primary Memory System.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Implementation and Evaluation of On-Chip Network Architectures.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

2004

TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Recent extensions to the SimpleScalar tool suite.

[BibT_eX]

[DOI]

Todd M. Austin

SIGMETRICS Perform. Evaluation Rev., 2004

Scalable Hardware Memory Disambiguation for High-ILP Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Scaling to the End of Silicon with EDGE Architectures.

[BibT_eX]

[DOI]

Computer, 2004

Scalable selective re-execution for EDGE architectures.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systems, 2004

Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004

2003

Static energy reduction techniques for microprocessor caches.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2003

Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture.

[BibT_eX]

[DOI]

IEEE Micro, 2003

Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches.

[BibT_eX]

[DOI]

Changkyu Kim

IEEE Micro, 2003

Universal Mechanisms for Data-Parallel Architectures.

[BibT_eX]

[DOI]

William R. Mark

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Microprocessor pipeline energy analysis.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Exploiting Microarchitectural Redundancy For Defect Tolerance.

[BibT_eX]

[DOI]

Charles R. Moore

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Routed Inter-ALU Networks for ILP Scalability and Performance.

[BibT_eX]

[DOI]

Vincent Ajay Singh

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

2002

Errata on "Measuring Experimental Error in Microprocessor Simulation".

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2002

The Optimal Logic Depth Per Pipeline Stage is 6 to 8 FO4 Inverter Delays.

[BibT_eX]

[DOI]

M. S. Hrishikesh

Norman P. Jouppi

Keith I. Farkas

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Dependable Systems and Networks (DSN 2002), 2002

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches.

[BibT_eX]

[DOI]

Changkyu Kim

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001

A design space evaluation of grid processor architectures.

[BibT_eX]

[DOI]

Ramadass Nagarajan

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Measuring Experimental Error in Microprocessor Simulation.

[BibT_eX]

[DOI]

Rajagopalan Desikan

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Exploring the Design Space of Future CMPs.

[BibT_eX]

[DOI]

Jaehyuk Huh

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

The impact of delay on the design of branch predictors.

[BibT_eX]

[DOI]

Daniel A. Jiménez

Calvin Lin

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Processor Mechanisms for Software Shared Memory.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, Third International Symposium, 2000

Clock rate versus IPC: the end of the road for conventional microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999

Concurrent Event Handling through Multithreading.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

1998

Fast thread communication and synchronization mechanisms for a scalable single chip multiprocessor.

[BibT_eX]

[DOI]

PhD thesis, 1998

An Efficient, Protected Message Interface.

[BibT_eX]

[DOI]

Computer, 1998

Exploiting Fine-grain Thread Level Parallelism on the MIT Multi-ALU Processor.

[BibT_eX]

[DOI]

Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

The effects of explicitly parallel mechanisms on the multi-ALU processor cluster pipeline.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors, 1998

1997

The M-machine multicomputer.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 1997

1994

Hardware Support for Fast Capability-based Addressing.

[BibT_eX]

[DOI]

Nicholas P. Carter

Proceedings of the ASPLOS-VI Proceedings, 1994

1992

Processor Coupling: Integrating Compile Time and Runtime Scheduling for Parallelism.

[BibT_eX]

[DOI]