Hritvik Taneja

Aamer Jaleel

Moinuddin Qureshi

CoRR, June, 2025

RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.

[DOI]

CoRR, February, 2025

2024

SDQ: Sparse Decomposed Quantization for LLM Inference.

[DOI]

CoRR, 2024

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition.

[DOI]

Geonhwa Jeong

Abhimanyu Rajeshkumar Bambhaniya

Stephen W. Keckler

Tushar Krishna

CoRR, 2024

Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse.

[DOI]

Yunan Zhang

Hung-Wei Tseng

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms.

[DOI]

Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

2023

Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.

[DOI]

ACM Trans. Comput. Syst., 2023

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity.

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.

[DOI]

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).

[DOI]

Toluwanimi O. Odemuyiwa

Hadi Asghari Moghaddam

Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.

[DOI]

Toluwanimi O. Odemuyiwa

Hadi Asghari Moghaddam

Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling.

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization.

[DOI]

Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

SIMD<sup>2</sup>: a generalized matrix instruction set for accelerating tensor computation beyond GEMM.

[DOI]

Yunan Zhang

Hung-Wei Tseng

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Demystifying Map Space Exploration for NPUs.

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2022

2021

Leaking Secrets Through Compressed Caches.

[DOI]

Andrés Sánchez

IEEE Micro, 2021

Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators.

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Mind mappings: enabling efficient algorithm-accelerator mapping space search.

[DOI]

Sivasankaran Rajamanickam

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.

[DOI]

Roberto Gioiosa

Tushar Krishna

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Safecracker: Leaking Secrets through Compressed Caches.

[DOI]

Andrés Sánchez

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy.

[DOI]

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018

Rethinking the Memory Hierarchy for Modern Languages.

[DOI]

Yee Ling Gan

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies.

[DOI]

Changping Chen

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores.

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Jenga: Software-Defined Cache Hierarchies.

[DOI]

Nathan Beckmann

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Nexus: A New Approach to Replication in Distributed Shared Caches.

[DOI]

Nathan Beckmann

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

Uncertainty and Mental Workload Among Wayfinding Strategies.

[DOI]

Proceedings of the Universal Access in Human-Computer Interaction. Users and Context Diversity, 2016

2015

Feature space optimization of multispectral imagery and LiDAR waveform data.

[DOI]

Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium, 2015

Scaling distributed cache hierarchies through computation and data co-scheduling.

[DOI]

Nathan Beckmann