2025
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving.
CoRR, January, 2025

2023
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning.
CoRR, 2023

TensorIR: An Abstraction for Automatic Tensorized Program Optimization.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Tensor Program Optimization with Probabilistic Programs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2019
Cross-Stream Selective Networks for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

2018
Recurrent Residual Module for Fast Inference in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018