TurboAttention: Efficient Attention Approximation For High Throughputs LLMs.
CoRR, 2024
Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers.
CoRR, 2024
Predict; Don't React for Enabling Efficient Fine-Grain DVFS in GPUs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Predict; Do not React for Enabling Efficient Fine Grain DVFS in GPUs.
CoRR, 2022
Accelerating Variational Quantum Algorithms Using Circuit Concurrency.
CoRR, 2021
DUB: dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads.
Proceedings of the NOCS '21: International Symposium on Networks-on-Chip, 2021
Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Optimizing GPU Cache Policies for MI Workloads.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
Scalable Distributed Last-Level TLBs Using Low-Latency Interconnects.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018