FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving.
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
TensorIR: An Abstraction for Automatic Tensorized Program Optimization.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
Tensor Program Optimization with Probabilistic Programs.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Cross-Stream Selective Networks for Action Recognition.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019
Recurrent Residual Module for Fast Inference in Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018