2025
Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler.
CoRR, April, 2025

2024
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

vMCU: Coordinated Memory Management and Kernel Optimization for DNN Inference on MCUs.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

FlexHE: A flexible Kernel Generation Framework for Homomorphic Encryption-Based Private Inference.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers.
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, 2024

MoteNN: Memory Optimization via Fine-grained Scheduling for Deep Neural Networks on Tiny Devices.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Chimera: An Analytical Optimizing Framework for Effective Compute-intensive Operators Fusion.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training.
IEEE Trans. Parallel Distributed Syst., 2022

AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2020
FlexTensor: An Automatic Schedule Exploration and Optimization Framework for Tensor Computation on Heterogeneous System.
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020