Yakun Sophia Shao

CoRR, 2024

Virgo: Cluster-level Matrix Unit Integration in GPUs for Scalability and Energy Efficiency.

[BibT_eX]

[DOI]

CoRR, 2024

LLM-Aided Compilation for Tensor Accelerators.

[BibT_eX]

[DOI]

CoRR, 2024

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets.

[BibT_eX]

[DOI]

CoRR, 2024

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization.

[BibT_eX]

[DOI]

CoRR, 2024

Stellar: An Automated Design Framework for Dense and Sparse Spatial Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Design Approach for Die-to-Die Interfaces to Enable Energy-Efficient Chiplet Systems.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, 2024

FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs.

[BibT_eX]

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Next-Generation Domain-Specific Accelerators: From Hardware to System.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

2023

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation.

[BibT_eX]

[DOI]

Dataset, June, 2023

Guest Editorial Introduction to the Special Issue on the 2022 IEEE International Solid-State Circuits Conference (ISSCC).

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2023

SPEED: Speculative Pipelined Execution for Efficient Decoding.

[BibT_eX]

[DOI]

CoRR, 2023

Code Transpilation for Hardware Accelerators.

[BibT_eX]

[DOI]

CoRR, 2023

Full Stack Optimization of Transformer Inference: a Survey.

[BibT_eX]

[DOI]

CoRR, 2023

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RoSÉ: A Hardware-Software Co-Simulation Infrastructure Enabling Pre-Silicon Full-Stack Robotics SoC Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks.

[BibT_eX]

[DOI]

Seah Kim

Hasan Genc

Vadim Vadimovich Nikiforov

Krste Asanovic

Borivoje Nikolic

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

Efficient emotion recognition using hyperdimensional computing with combinatorial channel encoding and cellular automata.

[BibT_eX]

[DOI]

Brain Informatics, 2022

Learning A Continuous and Reconstructible Latent Space for Hardware Accelerator Design.

[BibT_eX]

[DOI]

Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

2021

SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2021

Simba: scaling deep-learning inference with chiplet-based architecture.

[BibT_eX]

[DOI]

Jason Clemons

Commun. ACM, 2021

Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

A 16mm<sup>2</sup> 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET.

[BibT_eX]

[DOI]

Proceedings of the 47th ESSCIRC 2021, 2021

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration.

[BibT_eX]

[DOI]

Jonathan Ragan-Kelley

Krste Asanovic

Borivoje Nikolic

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

Commercial Products.

[BibT_eX]

[DOI]

David A. Patterson

IEEE Micro, 2020

Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs.

[BibT_eX]

[DOI]

IEEE Micro, 2020

A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm.

[BibT_eX]

[DOI]

Brian Zimmer

IEEE J. Solid State Circuits, 2020

Invited: Chipyard - An Integrated SoC Research and Implementation Environment.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

NeuroVectorizer: end-to-end vectorization with deep reinforcement learning.

[BibT_eX]

[DOI]

Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures.

[BibT_eX]

[DOI]

CoRR, 2019

A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm.

[BibT_eX]

[DOI]

Brian Zimmer

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

SNAP: A 1.67 - 21.55TOPS/W Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference in 16nm CMOS.

[BibT_eX]

[DOI]

Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture.

[BibT_eX]

[DOI]

Jason Clemons

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Timeloop: A Systematic Approach to DNN Accelerator Evaluation.

[BibT_eX]

[DOI]

Brucek Khailany

Stephen W. Keckler

Joel S. Emer

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

MAGNet: A Modular Accelerator Generator for Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2019

A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.

[BibT_eX]

[DOI]

Stephen W. Keckler

Christopher W. Fletcher

Joel S. Emer

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Assisting High-Level Synthesis Improve SpMV Benchmark Through Dynamic Dependence Analysis.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2018

Hardware Acceleration.

[BibT_eX]

[DOI]

Martha A. Kim

IEEE Micro, 2018

A modular digital VLSI flow for high-productivity SoC design.

[BibT_eX]

[DOI]

Brucek Khailany

Evgeni Khmer

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Methods and infrastructure in the era of accelerator-centric architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE 60th International Midwest Symposium on Circuits and Systems, 2017

Using dynamic dependence analysis to improve the quality of high-level synthesis designs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

2016

Co-designing accelerators and SoC interfaces using gem5-Aladdin.

[BibT_eX]

[DOI]

Sam Likun Xi

Vijayalakshmi Srinivasan

Gu-Yeon Wei

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

2015

Research Infrastructures for Hardware Accelerators

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01750-6, 2015

The Aladdin Approach to Accelerator Design and Modeling.

[BibT_eX]

[DOI]

IEEE Micro, 2015

2014

Aladdin: A pre-RTL, power-performance accelerator simulator enabling large design space exploration of customized architectures.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

MachSuite: Benchmarks for accelerator design and customized architectures.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

2013

ISA-independent workload characterization and its implications for specialized architectures.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Energy characterization and instruction-level energy model of Intel's Xeon Phi processor.

[BibT_eX]

[DOI]