Krste Asanovic

Orcid: 0000-0003-0754-3975

Affiliations:
  • SiFive, USA
  • University of California at Berkeley, CA, USA
  • MIT, Cambridge, USA (former)


According to our database1, Krste Asanovic authored at least 155 papers between 1992 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Awards

ACM Fellow

ACM Fellow 2018, "For contributions to computer architecture, including the open RISC-V instruction set and Agile hardware".

IEEE Fellow

IEEE Fellow 2014, "For contributions to computer architecture".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
AuRORA: A Full-Stack Solution for Scalable and Virtualized Accelerator Integration.
IEEE Micro, 2024

FireAxe: Partitioned FPGA-Accelerated Simulation of Large-Scale RTL Designs.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Zoomie: A Software-like Debugging Tool for FPGAs.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Special Issue on Hot Chips 34.
IEEE Micro, 2023

AuRORA: Virtualized Accelerator Orchestration for Multi-Tenant Workloads.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

CDPU: Co-designing Compression and Decompression Processing Units for Hyperscale Systems.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Profiling Hyperscale Big Data Processing.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MoCA: Memory-Centric, Adaptive Execution for Multi-Tenant Deep Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
An Eight-Core 1.44-GHz RISC-V Vector Processor in 16-nm FinFET.
IEEE J. Solid State Circuits, 2022

Verifying RISC-V Physical Memory Protection.
CoRR, 2022

Hammer: a modular and reusable physical design flow tool: invited.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Cerberus: A Formal Approach to Secure and Efficient Enclave Memory Sharing.
Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022

2021
Accelerating Genomic Data Analytics With Composable Hardware Acceleration Framework.
IEEE Micro, 2021

Accessible, FPGA Resource-Optimized Simulation of Multiclock Systems in FireSim.
IEEE Micro, 2021

A Hardware Accelerator for Protocol Buffers.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

4.3 An Eight-Core 1.44GHz RISC-V Vector Machine in 16nm FinFET.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

COBRA: A Framework for Evaluating Compositions of Hardware Branch Predictors.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Vertically Integrated Computing Labs Using Open-Source Hardware Generators and Cloud-Hosted FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

A 16mm<sup>2</sup> 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET.
Proceedings of the 47th ESSCIRC 2021, 2021

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
A Dual-Core RISC-V Vector Processor With On-Chip Fine-Grain Power Management in 28-nm FD-SOI.
IEEE Trans. Very Large Scale Integr. Syst., 2020

Chipyard: Integrated Design, Simulation, and Implementation Framework for Custom SoCs.
IEEE Micro, 2020

Building Open Trusted Execution Environments.
IEEE Secur. Priv., 2020

ProTuner: Tuning Programs with Monte Carlo Tree Search.
CoRR, 2020

RLDRM: Closed Loop Dynamic Cache Allocation with Deep Reinforcement Learning for Network Function Virtualization.
Proceedings of the 6th IEEE Conference on Network Softwarization, 2020

AutoPhase: Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Genesis: A Hardware Acceleration Framework for Genomic Data Analysis.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Keystone: an open framework for architecting trusted execution environments.
Proceedings of the EuroSys '20: Fifteenth EuroSys Conference 2020, 2020

Invited: Chipyard - An Integrated SoC Research and Implementation Environment.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

An Energy-Efficient RISC-V RV32IMAC Microcontroller for Periodical-Driven Sensing Applications.
Proceedings of the 2020 IEEE Custom Integrated Circuits Conference, 2020

NeuroVectorizer: end-to-end vectorization with deep reinforcement learning.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

FirePerf: FPGA-Accelerated Full-System Hardware/Software Performance Profiling and Co-Design.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019
A Hardware Accelerator for Tracing Garbage Collection.
IEEE Micro, 2019

FireSim: FPGA-Accelerated Cycle-Exact Scale-Out System Simulation in the Public Cloud.
IEEE Micro, 2019

BROOM: An Open-Source Out-of-Order Processor With Resilient Low-Voltage Operation in 28-nm CMOS.
IEEE Micro, 2019

Co-design of deep neural nets and neural net accelerators for embedded vision applications.
IBM J. Res. Dev., 2019

Sanctorum: A lightweight security monitor for secure enclaves.
IACR Cryptol. ePrint Arch., 2019

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep-Learning Architectures.
CoRR, 2019

Deep Reinforcement Learning in System Optimization.
CoRR, 2019

Keystone: A Framework for Architecting TEEs.
CoRR, 2019

AutoPhase: Compiler Phase-Ordering for High Level Synthesis with Deep Reinforcement Learning.
CoRR, 2019

Simmani: Runtime Power Modeling for Arbitrary RTL with Automatic Signal Selection.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Golden Gate: Bridging The Resource-Efficiency Gap Between ASICs and FPGA Prototypes.
Proceedings of the International Conference on Computer-Aided Design, 2019

Centrifuge: Evaluating full-system HLS-generated heterogenous-accelerator SoCs using FPGA-Acceleration.
Proceedings of the International Conference on Computer-Aided Design, 2019

FPGA Accelerated INDEL Realignment in the Cloud.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

FASED: FPGA-Accelerated Simulation and Evaluation of DRAM.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Open-Source EDA Tools and IP, A View from the Trenches.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
An Out-of-Order RISC-V Processor with Resilient Low-Voltage Operation in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

DESSERT: Debugging RTL Effectively with State Snapshotting for Error Replays across Trillions of Cycles.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Generating the Next Wave of Custom Silicon.
Proceedings of the 44th IEEE European Solid State Circuits Conference, 2018

2017
Reprogrammable Redundancy for SRAM-Based Cache V<sub>min</sub> Reduction in a 28-nm RISC-V Processor.
IEEE J. Solid State Circuits, 2017

A RISC-V Processor SoC With Integrated Power Management at Submicrosecond Timescales in 28 nm FD-SOI.
IEEE J. Solid State Circuits, 2017

Distributed-Memory Breadth-First Search on Massive Graphs.
CoRR, 2017


Reducing Pagerank Communication via Propagation Blocking.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Cyclist: Accelerating hardware development.
Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Return of the Runtimes: Rethinking the Language Runtime System for the Cloud 3.0 Era.
Proceedings of the 16th Workshop on Hot Topics in Operating Systems, 2017

A Hardware Accelerator for Computing an Exact Dot Product.
Proceedings of the 24th IEEE Symposium on Computer Arithmetic, 2017

2016
An Agile Approach to Building RISC-V Microprocessors.
IEEE Micro, 2016

A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI.
IEEE J. Solid State Circuits, 2016

The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V.
CoRR, 2016

Strober: Fast and Accurate Sample-Based Energy Simulation for Arbitrary RTL.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

Vector Processors for Energy-Efficient Embedded Systems.
Proceedings of the Fourth ACM International Workshop on Many-core Embedded Systems, 2016

Sub-microsecond adaptive voltage scaling in a 28nm FD-SOI processor SoC.
Proceedings of the ESSCIRC Conference 2016: 42<sup>nd</sup> European Solid-State Circuits Conference, 2016

Reprogrammable redundancy for cache Vmin reduction in a 28nm RISC-V processor.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2016

Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015
Per-Core DVFS With Switched-Capacitor Converters for Energy Efficiency in Manycore Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Single-chip microprocessor that communicates directly using light.
Nat., 2015

The GAP Benchmark Suite.
CoRR, 2015

A RISC-V vector processor with tightly-integrated switched-capacitor DC-DC converters in 28nm FDSOI.
Proceedings of the Symposium on VLSI Circuits, 2015

GAIL: the graph algorithm iron law.
Proceedings of the 5th Workshop on Irregular Applications - Architectures and Algorithms, 2015

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Trash Day: Coordinating Garbage Collection in Distributed Systems.
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

Raven: A 28nm RISC-V vector processor with integrated switched-capacitor DC-DC converters and adaptive clocking.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

DIABLO: A Warehouse-Scale Computer Network Simulator using FPGAs.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Joint impact of random variations and RTN on dynamic writeability in 28nm bulk and FDSOI SRAM.
Proceedings of the 44th European Solid State Device Research Conference, 2014

A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators.
Proceedings of the ESSCIRC 2014, 2014

2013
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators.
ACM Trans. Comput. Syst., 2013

Direction-optimizing breadth-first search.
Sci. Program., 2013

A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Distributed Memory Breadth-First Search Revisited: Enabling Bottom-Up Search.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

The RISC-V instruction set.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Measuring the gap between programmable and fixed-function accelerators: A case study on speech recognition.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Welcome from general chairs.
Proceedings of the 2013 IEEE Hot Chips 25 Symposium (HCS), 2013

Tessellation: refactoring the OS around explicit resource containers with continuous adaptation.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

Convergence and scalarization for data-parallel architectures.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

PHANTOM: practical oblivious computation in a secure processor.
Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security, 2013

2012
SRAM Assist Techniques for Operation in a Wide Voltage Range in 28-nm CMOS.
IEEE Trans. Circuits Syst. II Express Briefs, 2012

Globally Synchronized Frames for guaranteed quality-of-service in on-chip networks.
J. Parallel Distributed Comput., 2012

Designing Chip-Level Nanophotonic Interconnection Networks.
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Context-centric Security.
Proceedings of the 7th USENIX Workshop on Hot Topics in Security, 2012

GPUs as an opportunity for offloading garbage collection.
Proceedings of the International Symposium on Memory Management, 2012

Chisel: constructing hardware in a Scala embedded language.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

2011
Real-time Musical Applications on an Experimental Operating System for Multi-Core Processors.
Proceedings of the 2011 International Computer Music Conference, 2011

The Maven vector-thread architecture.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

Tessellation operating system: Building a real-time, responsive, high-throughput client OS for many-core architectures.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

2010
Guest Editors' Introduction: Hot Chips 21.
IEEE Micro, 2010

Composing parallel software efficiently with lithe.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

A case for FAME: FPGA architecture model execution.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Re-architecting DRAM memory systems with monolithically integrated silicon photonics.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

RAMP gold: an FPGA-based architecture simulator for multiprocessors.
Proceedings of the 47th Design Automation Conference, 2010

2009
Building Many-Core Processor-to-DRAM Networks with Monolithic CMOS Silicon Photonics.
IEEE Micro, 2009

A view of the parallel computing landscape.
Commun. ACM, 2009

Silicon-photonic clos networks for global on-chip communication.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Designing multi-socket systems using silicon photonics.
Proceedings of the 23rd international conference on Supercomputing, 2009

2008
Implementing the scale vector-thread processor.
ACM Trans. Design Autom. Electr. Syst., 2008

MEMOCODE 2008 Co-Design Contest.
Proceedings of the 6th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2008), 2008

Building Manycore Processor-to-DRAM Networks with Monolithic Silicon Photonics.
Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Compiling for vector-thread architectures.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

2007
Activity-Sensitive Flip-Flop and Latch Selection for Reduced Energy.
IEEE Trans. Very Large Scale Integr. Syst., 2007

RAMP: Research Accelerator for Multiple Processors.
IEEE Micro, 2007

Continual hashing for efficient fine-grain state inconsistency detection.
Proceedings of the 25th International Conference on Computer Design, 2007

Transactors for parallel hardware and software co-design.
Proceedings of the IEEE International High Level Design Validation and Test Workshop, 2007

2006
Energy-aware lossless data compression.
ACM Trans. Comput. Syst., 2006

Unbounded Transactional Memory.
IEEE Micro, 2006

Rethinking Hardware Support for Network Analysis and Intrusion Prevention.
Proceedings of the 1st USENIX Workshop on Hot Topics in Security, 2006

METERG: Measurement-Based End-to-End Performance Estimation Technique in QoS-Capable Multiprocessors.
Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS 2006), 2006

Accelerating architectural exploration using canonical instruction segments.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Branch trace compression for snapshot-based simulation.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Implementing virtual memory in a vector processor with software restart markers.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Research accelerator for multiple processors.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006

2005
A Speculative Control Scheme for an Energy-Efficient Banked Register Fil.
IEEE Trans. Computers, 2005

Controlling program execution through binary instrumentation.
SIGARCH Comput. Archit. News, 2005

Mondrix: memory isolation for linux using mondriaan memory protection.
Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, 2005

Accelerating Multiprocessor Simulation with a Memory Timestamp Record.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Replacing global wires with an on-chip network: a power analysis.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004
The Vector-Thread Architecture.
IEEE Micro, 2004

Cache Refill/Access Decoupling for Vector Machines.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Power-optimal pipelining in deep submicron technology.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003
Reducing power density through activity migration.
Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Banked Multiported Register Files for High-Frequency Superscalar Microprocessors.
Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

Hardware Works, Software Doesn't: Enforcing Modularity with Mondriaan Memory Protection.
Proceedings of HotOS'03: 9th Workshop on Hot Topics in Operating Systems, 2003

2002
Fine-grain CAM-tag cache resizing using miss tags.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Dynamic Fine-Grain Leakage Reduction Using Leakage-Biased Bitlines.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Mondrian memory protection.
Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X), 2002

2001
Multithreading decoupled architectures for complexity-effective general purpose computing.
SIGARCH Comput. Archit. News, 2001

Direct addressed caches for reduced power consumption.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Heads and tails: a variable-length instruction format supporting parallel fetch and decode.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
Energy-Efficient Register Access.
Proceedings of the 13th Annual Symposium on Integrated Circuits and Systems Design, 2000

Dynamic zero compression for cache energy reduction.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

1997
Scalable Processors in the Billion-Transistor Era: IRAM.
Computer, 1997

A Fast Kohonen Net Implementation for Spert-II.
Proceedings of the Biological and Artificial Computation: From Neuroscience to Technology, 1997

Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology.
Proceedings of the 11th international conference on Supercomputing, 1997

Intelligent RAM (IRAM): The Industrial Setting, Applications and Architectures.
Proceedings of the Proceedings 1997 International Conference on Computer Design: VLSI in Computers & Processors, 1997

Using PHiPAC to speed error back-propagation learning.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

1996
Spert-II: A Vector Microprocessor System.
Computer, 1996

1995
SPERT-II: A Vector Microprocessor System and its Application to Large Problems in Backpropagation Training.
Proceedings of the Advances in Neural Information Processing Systems 8, 1995

1993
Using simulations of reduced precision arithmetic to design a neuro-microprocessor.
J. VLSI Signal Process., 1993

The design of a neuro-microprocessor.
IEEE Trans. Neural Networks, 1993

Designing A Connectionist Network Supercomputer.
Int. J. Neural Syst., 1993

1992
SPERT: a VLIW/SIMD microprocessor for artificial neural network computations.
Proceedings of the Application Specific Array Processors, 1992


  Loading...