Hardware-Software Co-Design Enabling Static and Dynamic Sparse Attention Mechanisms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.
CoRR, 2024