CoRR, January, 2025

2024

Systems Challenges and Opportunities for Genomics.

[DOI]

Satish Narayanasamy

Computer, August, 2024

Duet: A Collaborative User Driven Recommendation System for Edge Devices.

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023

nPoRe: n-polymer realigner for improved pileup-based variant calling.

[DOI]

BMC Bioinform., December, 2023

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks.

[DOI]

ACM Trans. Embed. Comput. Syst., October, 2023

Eidetic: An In-Memory Matrix Multiplication Accelerator for Neural Networks.

[DOI]

IEEE Trans. Computers, June, 2023

GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis.

[DOI]

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Vector-Processing for Mobile Devices: Benchmark and Analysis.

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022

Hardware-friendly User-specific Machine Learning for Edge Devices.

[DOI]

ACM Trans. Embed. Comput. Syst., September, 2022

Special Issue on In-Memory Computing.

[DOI]

IEEE Micro, 2022

Multi-Layer In-Memory Processing.

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

2021

In-/Near-Memory Computing

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01772-8, 2021

A 2.46M Reads/s Seed-Extension Accelerator for Next-Generation Sequencing Using a String-Independent PE Array.

[DOI]

IEEE J. Solid State Circuits, 2021

Cache Compression with Efficient in-SRAM Data Comparison.

[DOI]

Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021

SquiggleFilter: An Accelerator for Portable Virus Detection.

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

GenomicsBench: A Benchmark Suite for Genomics.

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees.

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs.

[DOI]

Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

MyML: User-Driven Machine Learning.

[DOI]

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020

A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing.

[DOI]

IEEE J. Solid State Circuits, 2020

17.3 GCUPS Pruning-Based Pair-Hidden-Markov-Model Accelerator for Next-Generation DNA Sequencing.

[DOI]

Proceedings of the IEEE Symposium on VLSI Circuits, 2020

SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space.

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Neksus: An Interconnect for Heterogeneous System-In-Package Architectures.

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Seesaw: End-to-end Dynamic Sensing for IoT using Machine Learning.

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

A 2.46M reads/s Genome Sequencing Accelerator using a 625 Processing-Element Array.

[DOI]

Proceedings of the 2020 IEEE Custom Integrated Circuits Conference, 2020

MARTINI: Memory Access Traces to Detect Attacks.

[DOI]

Proceedings of the CCSW'20, 2020

2019

TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers.

[DOI]

ACM Trans. Embed. Comput. Syst., 2019

Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks.

[DOI]

IEEE Micro, 2019

Compute cache for data parallel acceleration.

[DOI]

Reetu Das

Proceedings of the 12th International Workshop on Network on Chip Architectures, 2019

A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration.

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2019

Duality cache for data parallel acceleration.

[DOI]

Scott A. Mahlke

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks.

[DOI]

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

2018

ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata.

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

GenAx: A Genome Sequencing Accelerator.

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

In-Memory Data Parallel Processor.

[DOI]

Scott A. Mahlke

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017

Blurring the Lines between Memory and Computation.

[DOI]

Ezhil R. M. Balasubramanian

IEEE Micro, 2017

Mirage cores: the illusion of many out-of-order cores using in-order hardware.

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Cache automaton.

[DOI]

Arun Subramaniyan

Jingcheng Wang

David T. Blaauw

Dennis Sylvester

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism.

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Parallel Automata Processor.

[DOI]

Arun Subramaniyan

Salessawi Ferede Yitbarek

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Cold Boot Attacks are Still Hot: Security Analysis of Memory Scramblers in Modern Processors.

[DOI]

Misiker Tadesse Aga

Todd M. Austin

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Compute Caches.

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

In-memory Data Flow Processor.

[DOI]

Scott A. Mahlke

Ezhil R. M. Balasubramanian

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

Cache Automaton: Repurposing Caches for Automata Processing.

[DOI]

Arun Subramaniyan

Jingcheng Wang

David T. Blaauw

Dennis Sylvester

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Exploring Fine-Grained Heterogeneity with Composite Cores.

[DOI]

IEEE Trans. Computers, 2016

A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.

[DOI]

Parallel Comput., 2016

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.

[DOI]

Salessawi Ferede Yitbarek

CoRR, 2016

Exploring specialized near-memory processing for data intensive operations.

[DOI]

Tao Yang

Salessawi Ferede Yitbarek

Todd M. Austin

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks.

[DOI]

Zelalem Birhanu Aweke

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

DynaMOS: dynamic schedule migration for heterogeneous cores.

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Locking down insecure indirection with hardware-based control-data isolation.

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Getting in control of your control flow with control-data isolation.

[DOI]

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

2014

Design and Evaluation of Hierarchical Rings with Deflection Routing.

[DOI]

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Hi-Rise: A High-Radix Switch for 3D Integration with Single-Cycle Arbitration.

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

VIX: Virtual Input Crossbar for Efficient Switch Allocation.

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Power-Aware NoCs through Routing and Topology Reconfiguration.

[DOI]

Ritesh Parikh

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Quality-of-Service for a High-Radix Switch.

[DOI]

Proceedings of the 51st Annual Design Automation Conference 2014, 2014

Heterogeneous microarchitectures trump voltage scaling for low-power cores.

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Trace based phase prediction for tightly-coupled heterogeneous cores.

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Catnap: energy proportional multiple network-on-chip.

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Application-to-core mapping policies to reduce memory system interference in multi-core systems.

[DOI]

Onur Mutlu

Akhilesh Kumar

Mani Azimi

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Scaling towards kilo-core processors with asymmetric high-radix topologies.

[DOI]

Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

2012

Swizzle-Switch Networks for Many-Core Systems.

[DOI]

Nathaniel Ross Pinckney

IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Composite Cores: Pushing Heterogeneity Into a Core.

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Swizzle Switch: A self-arbitrating high-radix crossbar for NoC systems.

[DOI]

Nathaniel Ross Pinckney

Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012

High radix self-arbitrating switch fabric with multiple arbitration schemes and quality of service.

[DOI]

Proceedings of the 49th Annual Design Automation Conference 2012, 2012

XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems.

[DOI]

Nathaniel Ross Pinckney

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Application-to-core mapping policies to reduce memory interference in multi-core systems.

[DOI]

Onur Mutlu

Akhilesh Kumar

Mani Azimi

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.

[DOI]

IEEE Micro, 2011

RAFT: A router architecture with frequency tuning for on-chip networks.

[DOI]

J. Parallel Distributed Comput., 2011

2010

Aérgia: exploiting packet latency slack in on-chip networks.

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Cost-driven 3D integration with interconnect layers.

[DOI]

Proceedings of the 47th Design Automation Conference, 2010

2009

A case for integrated processor-cache partitioning in chip multiprocessors.

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A case for dynamic frequency tuning in on-chip networks.

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Application-aware prioritization mechanisms for on-chip networks.

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs.

[DOI]

Soumya Eachempati

Asit K. Mishra

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

2008

MIRA: A Multi-layered On-Chip Interconnect Router Architecture.

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

Performance and power optimization through data compression in Network-on-Chip architectures.

[DOI]

Asit K. Mishra

Chrysostomos Nicopoulos

Dongkook Park

Vijaykrishnan Narayanan

Ravishankar R. Iyer

Mazin S. Yousif

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

2007

A novel dimensionally-decomposed router for on-chip communication in 3D architectures.

[DOI]

Jongman Kim

Chrysostomos Nicopoulos

Dongkook Park

Yuan Xie

Mazin S. Yousif

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects.

[DOI]

Dongkook Park

Chrysostomos Nicopoulos

Jongman Kim

Ravishankar R. Iyer