Junbo Deng
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving.
CoRR, 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024