2024
Lossless KV Cache Compression to 2%.
CoRR, 2024

HMoE: Heterogeneous Mixture of Experts for Language Modeling.
CoRR, 2024