Xingkun Yang

According to our database¹, Xingkun Yang authored at least 2 papers in 2024.

Collaborative distances:

Timeline

2024

Book

In proceedings

Article

PhD thesis

Dataset

Other

2024

AttentionStore: Cost-effective Attention Reuse across Multi-turn Conversations in Large Language Model Serving.

[BibT_eX]

[DOI]

CoRR, 2024

Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention.

[BibT_eX]

[DOI]

Proceedings of the 2024 USENIX Annual Technical Conference, 2024