2025
Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing.
CoRR, January, 2025
2024
Systems Challenges and Opportunities for Genomics.
Computer, August, 2024
Duet: A Collaborative User Driven Recommendation System for Edge Devices.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
2023
nPoRe: n-polymer realigner for improved pileup-based variant calling.
BMC Bioinform., December, 2023
BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks.
ACM Trans. Embed. Comput. Syst., October, 2023
Eidetic: An In-Memory Matrix Multiplication Accelerator for Neural Networks.
IEEE Trans. Computers, June, 2023
GenDP: A Framework of Dynamic Programming Acceleration for Genome Sequencing Analysis.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Vector-Processing for Mobile Devices: Benchmark and Analysis.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023
2022
Hardware-friendly User-specific Machine Learning for Edge Devices.
ACM Trans. Embed. Comput. Syst., September, 2022
Special Issue on In-Memory Computing.
IEEE Micro, 2022
Multi-Layer In-Memory Processing.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
2021
In-/Near-Memory Computing
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01772-8, 2021
A 2.46M Reads/s Seed-Extension Accelerator for Next-Generation Sequencing Using a String-Independent PE Array.
,
,
,
,
,
,
,
,
,
,
IEEE J. Solid State Circuits, 2021
Cache Compression with Efficient in-SRAM Data Comparison.
Proceedings of the IEEE International Conference on Networking, Architecture and Storage, 2021
SquiggleFilter: An Accelerator for Portable Virus Detection.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021
GenomicsBench: A Benchmark Suite for Genomics.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Accelerated Seeding for Genome Sequence Alignment with Enumerated Radix Trees.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Compute-Capable Block RAMs for Efficient Deep Learning Acceleration on FPGAs.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021
MyML: User-Driven Machine Learning.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
2020
A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing.
IEEE J. Solid State Circuits, 2020
17.3 GCUPS Pruning-Based Pair-Hidden-Markov-Model Accelerator for Next-Generation DNA Sequencing.
Proceedings of the IEEE Symposium on VLSI Circuits, 2020
SeedEx: A Genome Sequencing Accelerator for Optimal Alignments in Subminimal Space.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Neksus: An Interconnect for Heterogeneous System-In-Package Architectures.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Seesaw: End-to-end Dynamic Sensing for IoT using Machine Learning.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
A 2.46M reads/s Genome Sequencing Accelerator using a 625 Processing-Element Array.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2020 IEEE Custom Integrated Circuits Conference, 2020
MARTINI: Memory Access Traces to Detect Attacks.
Proceedings of the CCSW'20, 2020
2019
TF-Net: Deploying Sub-Byte Deep Neural Networks on Microcontrollers.
ACM Trans. Embed. Comput. Syst., 2019
Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks.
IEEE Micro, 2019
Compute cache for data parallel acceleration.
Proceedings of the 12th International Workshop on Network on Chip Architectures, 2019
A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2019
Duality cache for data parallel acceleration.
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Bit Prudent In-Cache Acceleration of Deep Convolutional Neural Networks.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
2018
ASPEN: A Scalable In-SRAM Architecture for Pushdown Automata.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
GenAx: A Genome Sequencing Accelerator.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
In-Memory Data Parallel Processor.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
2017
Blurring the Lines between Memory and Computation.
IEEE Micro, 2017
Mirage cores: the illusion of many out-of-order cores using in-order hardware.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Parallel Automata Processor.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Cold Boot Attacks are Still Hot: Security Analysis of Memory Scramblers in Modern Processors.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
In-memory Data Flow Processor.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
Cache Automaton: Repurposing Caches for Automata Processing.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
2016
Exploring Fine-Grained Heterogeneity with Composite Cores.
IEEE Trans. Computers, 2016
A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.
Parallel Comput., 2016
Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.
CoRR, 2016
Exploring specialized near-memory processing for data intensive operations.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
ANVIL: Software-Based Protection Against Next-Generation Rowhammer Attacks.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
2015
DynaMOS: dynamic schedule migration for heterogeneous cores.
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Locking down insecure indirection with hardware-based control-data isolation.
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Getting in control of your control flow with control-data isolation.
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015
2014
Design and Evaluation of Hierarchical Rings with Deflection Routing.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Hi-Rise: A High-Radix Switch for 3D Integration with Single-Cycle Arbitration.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
VIX: Virtual Input Crossbar for Efficient Switch Allocation.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
Power-Aware NoCs through Routing and Topology Reconfiguration.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
Quality-of-Service for a High-Radix Switch.
Proceedings of the 51st Annual Design Automation Conference 2014, 2014
Heterogeneous microarchitectures trump voltage scaling for low-power cores.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Trace based phase prediction for tightly-coupled heterogeneous cores.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Catnap: energy proportional multiple network-on-chip.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013
Application-to-core mapping policies to reduce memory system interference in multi-core systems.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Scaling towards kilo-core processors with asymmetric high-radix topologies.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
2012
Swizzle-Switch Networks for Many-Core Systems.
,
,
,
,
,
,
,
,
,
,
,
IEEE J. Emerg. Sel. Topics Circuits Syst., 2012
Composite Cores: Pushing Heterogeneity Into a Core.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Swizzle Switch: A self-arbitrating high-radix crossbar for NoC systems.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2012 IEEE Hot Chips 24 Symposium (HCS), 2012
High radix self-arbitrating switch fabric with multiple arbitration schemes and quality of service.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012
XPoint cache: scaling existing bus-based coherence protocols for 2D and 3D many-core systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
Application-to-core mapping policies to reduce memory interference in multi-core systems.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Aérgia: A Network-on-Chip Exploiting Packet Latency Slack.
IEEE Micro, 2011
RAFT: A router architecture with frequency tuning for on-chip networks.
J. Parallel Distributed Comput., 2011
2010
Aérgia: exploiting packet latency slack in on-chip networks.
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Cost-driven 3D integration with interconnect layers.
Proceedings of the 47th Design Automation Conference, 2010
2009
A case for integrated processor-cache partitioning in chip multiprocessors.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
A case for dynamic frequency tuning in on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Application-aware prioritization mechanisms for on-chip networks.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
2008
MIRA: A Multi-layered On-Chip Interconnect Router Architecture.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Performance and power optimization through data compression in Network-on-Chip architectures.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
2007
A novel dimensionally-decomposed router for on-chip communication in 3D architectures.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007
Design of a Dynamic Priority-Based Fast Path Architecture for On-Chip Interconnects.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007