Junbo Deng
Timeline
2024
0
1
2
3
1
1
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving.
CoRR, 2024
Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024