×
2024
Lossless KV Cache Compression to 2%.
[DOI]
Zhen Yang
,
J. N. Han
,
Kan Wu
,
Ruobing Xie
,
An Wang
,
Xingwu Sun
,
Zhanhui Kang
CoRR, 2024
HMoE: Heterogeneous Mixture of Experts for Language Modeling.
[DOI]
An Wang
,
Xingwu Sun
,
Ruobing Xie
,
Shuaipeng Li
,
Jiaqi Zhu
,
Zhen Yang
,
Pinxue Zhao
,
J. N. Han
,
Zhanhui Kang
,
Di Wang
,
Naoaki Okazaki
,
Cheng-Zhong Xu
CoRR, 2024