Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, April, 2025
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024
FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024
MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024
MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020