×
2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
[DOI]
Xiaoxuan Liu
,
Cade Daniel
,
Langxiang Hu
,
Woosuk Kwon
,
Zhuohan Li
,
Xiangxi Mo
,
Alvin Cheung
,
Zhijie Deng
,
Ion Stoica
,
Hao Zhang
CoRR, 2024