Joel S. Emer

Commun. ACM, 2021

Mentoring Opportunities in Computer Architecture: Analyzing the Past to Develop the Future.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Workshop on Computer Architecture Education, 2021

Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Architecture-Level Energy Estimation for Heterogeneous Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

SpZip: Architectural Support for Effective Data Compression In Irregular Applications.

[BibT_eX]

[DOI]

Yifan Yang

Daniel Sánchez

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Gamma: leveraging Gustavson's algorithm to accelerate sparse matrix multiplication.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

Efficient Processing of Deep Neural Networks

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01766-7, 2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.

[BibT_eX]

[DOI]

Brian Zimmer

IEEE J. Solid State Circuits, 2020

Freely scalable and reconfigurable optical hardware for deep learning.

[BibT_eX]

[DOI]

CoRR, 2020

Estimating Silent Data Corruption Rates Using a Two-Level Model.

[BibT_eX]

[DOI]

Siva Kumar Sastry Hari

CoRR, 2020

An Architecture-Level Energy and Area Estimator for Processing-In-Memory Accelerator Designs.

[BibT_eX]

[DOI]

Yannan Nellie Wu

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

2019

Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2019

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.

[BibT_eX]

[DOI]

Brian Zimmer

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.

[BibT_eX]

[DOI]

Yakun Sophia Shao

Jason Clemons

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.

[BibT_eX]

[DOI]

Kartik Hegde

Hadi Asghari Moghaddam

Christopher W. Fletcher

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.

[BibT_eX]

[DOI]

Brucek Khailany

Stephen W. Keckler

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs.

[BibT_eX]

[DOI]

Yannan Nellie Wu

Proceedings of the International Conference on Computer-Aided Design, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.

[BibT_eX]

[DOI]

Stephen W. Keckler

Christopher W. Fletcher

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

DAWG: A Defense Against Cache Timing Attacks in Speculative Execution Processors.

[BibT_eX]

[DOI]

IACR Cryptol. ePrint Arch., 2018

Eyeriss v2: A Flexible and High-Performance Accelerator for Emerging Deep Neural Networks.

[BibT_eX]

[DOI]

Yu-Hsin Chen

CoRR, 2018

Harmonizing Speculative and Non-Speculative Execution in Architectures for Ordered Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A modular digital VLSI flow for high-productivity SoC design.

[BibT_eX]

[DOI]

Brucek Khailany

Evgeni Khmer

Proceedings of the 55th Annual Design Automation Conference, 2018

Hardware for machine learning: Challenges and opportunities.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Custom Integrated Circuits Conference, 2018

2017

(FPL 2015) Scavenger: Automating the Construction of Application-Optimized Memory Hierarchies.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2017

Efficient Processing of Deep Neural Networks: A Tutorial and Survey.

[BibT_eX]

[DOI]

Proc. IEEE, 2017

Using Dataflow to Optimize Energy Efficiency of Deep Neural Network Accelerators.

[BibT_eX]

[DOI]

Yu-Hsin Chen

IEEE Micro, 2017

Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2017

Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision.

[BibT_eX]

[DOI]

CoRR, 2017

Understanding error propagation in deep learning neural network (DNN) accelerators and applications.

[BibT_eX]

[DOI]

Guanpeng Li

Siva Kumar Sastry Hari

Proceedings of the International Conference for High Performance Computing, 2017

SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation.

[BibT_eX]

[DOI]

Siva Kumar Sastry Hari

Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Towards closing the energy gap between HOG and CNN features for embedded vision (Invited paper).

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Fractal: An Execution Model for Fine-Grain Nested Speculative Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Automatic Construction of Program-Optimized FPGA Memory Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A method to estimate the energy consumption of deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 51st Asilomar Conference on Signals, Systems, and Computers, 2017

SAM: Optimizing Multithreaded Cores for Speculative Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Unlocking Ordered Parallelism with the Swarm Architecture.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Data-centric execution of speculative parallel programs.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

CLARA: Circular Linked-List Auto and Self Refresh Architecture.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Solid-State Circuits Conference, 2016

Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks.

[BibT_eX]

[DOI]

Yu-Hsin Chen

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

LMC: Automatic Resource-Aware Program-Optimized Memory Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015

Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2015

A fast and accurate analytical technique to compute the AVF of sequential bits in a processor.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

A scalable architecture for ordered parallelism.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

High performing cache hierarchies for server workloads: Relaxing inclusion to capture the latency benefits of exclusive caches.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Scavenger: Automating the construction of application-optimized memory hierarchies.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

2014

Efficient Spatial Processing Element Control via Triggered Instructions.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Exploiting spatial architectures for edit distance algorithms.

[BibT_eX]

[DOI]

Jesmin Jahan Tithi

Neal Clayton Crago

Hashem Hashemi Najaf-abadi

Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

The LEAP FPGA operating system.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

LEAP Shared Memories: Automating the Construction of FPGA Coherent Memories.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

2013

Using in-flight chains to build a scalable cache coherence protocol.

[BibT_eX]

[DOI]

Samantika Subramaniam

ACM Trans. Archit. Code Optim., 2013

Triggered instructions: a control paradigm for spatially-programmed architectures.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

A Hierarchical Architectural Framework for Reconfigurable Logic Computing.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Optimizing under abstraction: Using prefetching to improve FPGA performance.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

2012

The gradient-based cache partitioning algorithm.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Scheduling heterogeneous multi-cores through performance impact estimation (PIE).

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

ZIP-IO: Architecture for application-specific compression of Big Data.

[BibT_eX]

[DOI]

Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

Leveraging latency-insensitivity to ease multiple FPGA design.

[BibT_eX]

[DOI]

Kermin Elliott Fleming

Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

CRUISE: cache replacement and utility-aware scheduling.

[BibT_eX]

[DOI]

Aamer Jaleel

Samantika Subramaniam

Simon C. Steely Jr.

Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012

2011

DEC Alpha.

[BibT_eX]

[DOI]

Tryggve Fossum

Proceedings of the Encyclopedia of Parallel Computing, 2011

PACMan: prefetch-aware cache management for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

SHiP: signature-based hit predictor for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Leap scratchpads: automatic memory and cache management for reconfigurable logic.

[BibT_eX]

[DOI]

Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2010

The Future of Architectural Simulation.

[BibT_eX]

[DOI]

IEEE Micro, 2010

Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies.

[BibT_eX]

[DOI]

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Design contest overview: Combined architecture for network stream categorization and intrusion detection (CANSCID).

[BibT_eX]

[DOI]

Forrest Brewer

Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010), 2010

High performance cache replacement using re-reference interval prediction (RRIP).

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

2009

A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs.

[BibT_eX]

[DOI]

Michael Pellauer

Michael Adler

Arvind

ACM Trans. Reconfigurable Technol. Syst., 2009

Guest Editors' Introduction: Top Picks from the 2008 Computer Architecture Conferences.

[BibT_eX]

[DOI]

Dean M. Tullsen

IEEE Micro, 2009

Accelerating architecture research.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

CAMP: A technique to estimate per-structure power at run-time using a few simple parameters.

[BibT_eX]

[DOI]

Michael D. Powell

Arijit Biswas

Basit Riaz Sheikh

Shrirang M. Yardi

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Soft connections: addressing the hardware-design modularity problem.

[BibT_eX]

[DOI]

Proceedings of the 46th Design Automation Conference, 2009

2008

Set-Dueling-Controlled Adaptive Insertion for High-Performance Caching.

[BibT_eX]

[DOI]

IEEE Micro, 2008

Computing Accurate AVFs using ACE Analysis on Performance Models: A Rebuttal.

[BibT_eX]

[DOI]

Arijit Biswas

Paul Racunas

IEEE Comput. Archit. Lett., 2008

Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs.

[BibT_eX]

[DOI]

Michael Pellauer

Michael Adler

Arvind

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

A-Ports: an efficient abstraction for cycle-accurate performance models on FPGAs.

[BibT_eX]

[DOI]

Michael Pellauer

Michael Adler

Arvind

Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

Adaptive insertion policies for managing shared caches.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Single-Threaded vs. Multithreaded: Where Should We Focus?

[BibT_eX]

[DOI]

IEEE Micro, 2007

Late-binding: enabling unordered load-store queues.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Adaptive insertion policies for high performance caching.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

2005

Computing Architectural Vulnerability Factors for Address-Based Structures.

[BibT_eX]

[DOI]

Ram Rangan

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

The Soft Error Problem: An Architectural Perspective.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

2004

Reducing the Soft-Error Rate of a High-Performance Microprocessor.

[BibT_eX]

[DOI]

IEEE Micro, 2004

Cache Scrubbing in Microprocessors: Myth or Necessity?

[BibT_eX]

[DOI]

Tryggve Fossum

Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2004), 2004

Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor.

[BibT_eX]

[DOI]

Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

2003

Measuring Architectural Vulnerability Factors.

[BibT_eX]

[DOI]

Todd M. Austin

IEEE Micro, 2003

A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor.

[BibT_eX]

[DOI]

Todd M. Austin

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

2002

Performance Simulation Tools.

[BibT_eX]

[DOI]

Computer, 2002

Asim: A Performance Model Framework.

[BibT_eX]

[DOI]

Computer, 2002

Tarantula: A Vector Extension to the Alpha Architecture.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Loose Loops Sink Chips.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

A comparative study of arbitration algorithms for the Alpha 21364 pipelined router.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2000

Combining Static and Dynamic Branch Prediction to Reduce Destructive Aliasing.

[BibT_eX]

[DOI]

Harish Patil

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999

The Use of Multithreading for Exception Handling.

[BibT_eX]

[DOI]

Craig B. Zilles

Gurindar S. Sohi

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Reducing cache misses using hardware and software page placement.

[BibT_eX]

[DOI]

Timothy Sherwood

Brad Calder

Proceedings of the 13th international conference on Supercomputing, 1999

1998

A Characterization of Processor Performance in the VAX-11/780.

[BibT_eX]

[DOI]

Douglas W. Clark

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Retrospective: Characterization of Processor Performance in the VAX-11/780.

[BibT_eX]

[DOI]

Douglas W. Clark

Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998

Memory Dependence Prediction Using Store Sets.

[BibT_eX]

[DOI]

George Z. Chrysos

Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998

1997

Converting Thread-Level Parallelism to Instruction-Level Parallelism via Simultaneous Multithreading.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 1997

Simultaneous multithreading: a platform for next-generation processors.

[BibT_eX]

[DOI]

IEEE Micro, 1997

A Language for Describing Predictors and Its Application to Automatic Synthesis.

[BibT_eX]

[DOI]

Nicholas C. Gloy

Proceedings of the 24th International Symposium on Computer Architecture, 1997

1996

Incremental Versus Revolutionary Research.

[BibT_eX]

[DOI]

ACM Comput. Surv., 1996

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Predictive Sequential Associative Cache.

[BibT_eX]

[DOI]

Brad Calder

Dirk Grunwald

Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

1995

A system level perspective on branch architecture performance.

[BibT_eX]

[DOI]

Brad Calder

Dirk Grunwald

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Instruction Fetching: Coping with Code Bloat.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

1989

Performance Analysis of Mass Storage Service Alternatives for Distributed Systems.

[BibT_eX]

[DOI]

K. K. Ramakrishnan

IEEE Trans. Software Eng., 1989

1988

Performance Considerations for Distributed Services: A Case Study: Mass Storage.

[BibT_eX]

[DOI]

K. K. Ramakrishnan

Proceedings of the 8th International Conference on Distributed Computing Systems, 1988

1986

Design analysis of a heterogeneous distributed system.

[BibT_eX]

[DOI]

K. K. Ramakrishnan

Proceedings of the 2nd ACM SIGOPS European Workshop, 1986

1985

Performance of the VAX-11/780 Translation Buffer: Simulation and Measurement

[BibT_eX]

[DOI]

Douglas W. Clark

ACM Trans. Comput. Syst., 1985

1979

Shared Resources for Multiple Instruction Stream Pipelined Processors

[BibT_eX]

[DOI]