NanoFlow: Towards Optimal Large Language Model Serving Throughput.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Can Storage Devices be Power Adaptive?
Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems, 2024
Optimizing half precision Winograd convolution on ARM many-core processors.
Proceedings of the APSys '22: 13th ACM SIGOPS Asia-Pacific Workshop on Systems, Virtual Event, Singapore, August 23, 2022