2025
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression.
CoRR, February, 2025
2024
SDQ: Sparse Decomposed Quantization for LLM Inference.
CoRR, 2024
Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition.
CoRR, 2024
Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Mind the Gap: Attainable Data Movement and Operational Intensity Bounds for Tensor Algorithms.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
2023
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.
,
,
,
,
,
,
,
,
,
,
,
ACM Trans. Comput. Syst., 2023
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023
Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
Ruby: Improving Hardware Efficiency for Tensor Algebra Accelerators Through Imperfect Factorization.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022
SIMD<sup>2</sup>: a generalized matrix instruction set for accelerating tensor computation beyond GEMM.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Demystifying Map Space Exploration for NPUs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022
2021
Leaking Secrets Through Compressed Caches.
IEEE Micro, 2021
Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021
Mind mappings: enabling efficient algorithm-accelerator mapping space search.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021
2020
Safecracker: Leaking Secrets through Compressed Caches.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
2019
Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
2018
Rethinking the Memory Hierarchy for Modern Languages.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Adaptive Scheduling for Systems with Asymmetric Memory Hierarchies.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
Jenga: Software-Defined Cache Hierarchies.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Nexus: A New Approach to Replication in Distributed Shared Caches.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
2016
Uncertainty and Mental Workload Among Wayfinding Strategies.
Proceedings of the Universal Access in Human-Computer Interaction. Users and Context Diversity, 2016
2015
Feature space optimization of multispectral imagery and LiDAR waveform data.
Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium, 2015
Scaling distributed cache hierarchies through computation and data co-scheduling.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
2013
Hybrid path-diversity-aware adaptive routing with latency prediction model in Network-on-Chip systems.
Proceedings of the 2013 International Symposium on VLSI Design, Automation, and Test, 2013
2012
Path-Diversity-Aware Adaptive Routing in Network-on-Chip Systems.
Proceedings of the IEEE 6th International Symposium on Embedded Multicore/Manycore SoCs, 2012