Portable, High Performance Matrix Multiplication Micro-Kernels for RISC-V with ExO.
Proceedings of the 33rd Euromicro International Conference on Parallel, 2025
Tackling the Matrix Multiplication Micro-Kernel Generation with Exo.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024
Full Stack Optimization of Transformer Inference: a Survey.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
DOSA: Differentiable Model-Based One-Loop Search for DNN Accelerators.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Communication bounds for convolutional neural networks.
Proceedings of the PASC '22: Platform for Advanced Scientific Computing Conference, Basel, Switzerland, June 27, 2022
CoSA: Scheduling by Constrained Optimization for Spatial Accelerators.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020
Communication-Optimal Convolutional Neural Nets.
CoRR, 2018