2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization.
CoRR, March, 2025
MixDiT: Accelerating Image Diffusion Transformer Inference With Mixed-Precision MX Quantization.
IEEE Comput. Archit. Lett., 2025
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization.
Proceedings of the 52nd Annual International Symposium on Computer Architecture, 2025
2024
Cerberus: Triple Mode Acceleration of Sparse Matrix and Vector Multiplication.
ACM Trans. Archit. Code Optim., June, 2024
Accelerating String-key Learned Index Structures via Memoization-based Incremental Training.
Proc. VLDB Endow., April, 2024
Hardware-hardened Sandbox Enclaves for Trusted Serverless Computing.
ACM Trans. Archit. Code Optim., March, 2024
A Latency Processing Unit: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2024
LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
ONNXim: A Fast, Cycle-Level Multi-Core NPU Simulator.
IEEE Comput. Archit. Lett., 2024
DACAPO: Accelerating Continuous Learning in Autonomous Systems for Video Analytics.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale.
Proceedings of the IEEE International Symposium on Workload Characterization, 2024
Interference-Aware DNN Serving on Heterogeneous Processors in Edge Systems.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024
LVS: A Learned Video Storage for Fast and Efficient Video Understanding.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Tandem Processor: Grappling with Emerging Operators in Neural Networks.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
FlexBlock: A Flexible DNN Training Accelerator With Multi-Mode Block Floating Point Support.
IEEE Trans. Computers, September, 2023
HAMMER: Hardware-Friendly Approximate Computing for Self-Attention With Mean-Redistribution And Linearization.
IEEE Comput. Archit. Lett., 2023
2022
Yin-Yang: Programming Abstractions for Cross-Domain Multi-Acceleration.
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2022
CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022
Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022
Tunable Memory Protection for Secure Neural Processing Units.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
Supporting Dynamic Translation Granularity for Hybrid Memory Systems.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
2021
SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms.
ACM Trans. Archit. Code Optim., 2021
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning.
CoRR, 2021
Stockade: Hardware Hardening for Distributed Trusted Sandboxes.
CoRR, 2021
Common Counters: Compressed Encryption Counters for Secure GPU Memory.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
2020
Decoupled Address Translation for Heterogeneous Memory Systems.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020
Mixed-Signal Charge-Domain Acceleration of Deep Neural Networks through Interleaved Bit-Partitioned Arithmetic.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020
2019
Machine Learning Acceleration.
IEEE Micro, 2019
2018
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
2017
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks.
CoRR, 2017
Scale-out acceleration for machine learning.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
2016
From high-level deep neural models to FPGAs.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Towards Statistical Guarantees in Controlling Quality Tradeoffs for Approximate Acceleration.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
TABLA: A unified template-based framework for accelerating statistical machine learning.
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
AxGames: Towards Crowdsourcing Quality Target Determination in Approximate Computing.
Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016
2015
Axilog: Abstractions for Approximate Hardware Design and Reuse.
IEEE Micro, 2015
FlexJava: language support for safe and modular approximate programming.
Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 2015
Neural acceleration for GPU throughput processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Axilog: language support for approximate hardware design.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
2014
General-purpose code acceleration with limited-precision analog computation.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Rollback-free value prediction with approximate loads.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Isolated Mini-domain for Trusted Cloud Computing.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013
2012
Locality-aware dynamic VM reconfiguration on MapReduce clouds.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012