×
2024
Inference Performance Optimization for Large Language Models on CPUs.
[DOI]
Pujiang He
,
Shan Zhou
,
Wenhuan Huang
,
Changqing Li
,
Duyi Wang
,
Bin Guo
,
Chen Meng
,
Sheng Gui
,
Weifei Yu
,
Yi Xie
CoRR, 2024
Distributed Inference Performance Optimization for LLMs on CPUs.
[DOI]
Pujiang He
,
Shan Zhou
,
Changqing Li
,
Wenhuan Huang
,
Weifei Yu
,
Duyi Wang
,
Chen Meng
,
Sheng Gui
CoRR, 2024