2025

Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization.

[DOI]

Minsu Kim

Seongmin Hong

CoRR, March, 2025

MixDiT: Accelerating Image Diffusion Transformer Inference With Mixed-Precision MX Quantization.

[DOI]

IEEE Comput. Archit. Lett., 2025

Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization.

[DOI]

Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025

2024

Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication.

[DOI]

ACM Trans. Archit. Code Optim., June, 2024

Accelerating String-key Learned Index Structures via Memoization-based Incremental Training.

[DOI]

Proc. VLDB Endow., April, 2024

Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing.

[DOI]

ACM Trans. Archit. Code Optim., March, 2024

A Latency Processing Unit: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference.

[DOI]

IEEE Micro, 2024

LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference.

[DOI]

CoRR, 2024

ONNXim: A Fast, Cycle-Level Multi-Core NPU Simulator.

[DOI]

IEEE Comput. Archit. Lett., 2024

DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics.

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale.

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2024

Interference-Aware DNN Serving on Heterogeneous Processors in Edge Systems.

[DOI]

Proceedings of the 42nd IEEE International Conference on Computer Design, 2024

LVS: A Learned Video Storage for Fast and Efficient Video Understanding.

[DOI]

Yunghee Lee

Jongse Park

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing.

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

Tandem Processor: Grappling with Emerging Operators in Neural Networks.

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support.

[DOI]

IEEE Trans. Computers, September, 2023

HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization.

[DOI]

IEEE Comput. Archit. Lett., 2023

2022

Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration.

[DOI]

Brahmendra Reddy Yatham

IEEE Micro, 2022

CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics.

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing.

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Tunable Memory Protection for Secure Neural Processing Units.

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

Supporting Dynamic Translation Granularity for Hybrid Memory Systems.

[DOI]

Proceedings of the IEEE 40th International Conference on Computer Design, 2022

TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit.

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms.

[DOI]

ACM Trans. Archit. Code Optim., 2021

Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning.

[DOI]

CoRR, 2021

Stockade: Hardware Hardening for Distributed Trusted Sandboxes.

[DOI]

CoRR, 2021

Common Counters: Compressed Encryption Counters for Secure GPU Memory.

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

Decoupled Address Translation for Heterogeneous Memory Systems.

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Machine Learning Acceleration.

[DOI]

Hadi Esmaeilzadeh

Jongse Park

IEEE Micro, 2019

2018

A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks.

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network.

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

2017

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks.

[DOI]

CoRR, 2017

Scale-out acceleration for machine learning.

[DOI]

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016

From high-level deep neural models to FPGAs.

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration.

[DOI]

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

TABLA: A unified template-based framework for accelerating statistical machine learning.

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing.

[DOI]

Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016

2015

Axilog: Abstractions for Approximate Hardware Design and Reuse.

[DOI]

Anandhavel Nagendrakumar

Abbas Rahimi

Hadi Esmaeilzadeh

Kia Bazargan

IEEE Micro, 2015

FlexJava: language support for safe and modular approximate programming.

[DOI]

Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015

Neural acceleration for GPU throughput processors.

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Axilog: language support for approximate hardware design.

[DOI]

Anandhavel Nagendrakumar

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014

General-purpose code acceleration with limited-precision analog computation.

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Rollback-free value prediction with approximate loads.

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Isolated Mini-domain for Trusted Cloud Computing.

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Locality-aware dynamic VM reconfiguration on MapReduce clouds.

[DOI]

Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012