Jenga: Effective Memory Management for Serving LLM with Heterogeneity.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
SkyStore: Cost-Optimized Object Storage Across Regions and Clouds.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
Pie: Pooling CPU Memory for LLM Inference.
CoRR, 2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
CoRR, 2024
Optimizing LLM Queries in Relational Workloads.
CoRR, 2024
Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
RALF: Accuracy-Aware Scheduling for Feature Store Maintenance.
Proc. VLDB Endow., November, 2023
Context-Aware Streaming Perception in Dynamic Environments.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Computer Vision - ECCV 2022, 2022
Towards Scalable Dataframe Systems.
Proc. VLDB Endow., 2020
InferLine: latency-aware provisioning and scaling for prediction serving pipelines.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020
Pay Attention to Convolution Filters: Towards Fast and Accurate Fine-Grained Transfer Learning.
CoRR, 2019
The OoO VLIW JIT Compiler for GPU Inference.
CoRR, 2019
Dynamic Space-Time Scheduling for GPU Inference.
CoRR, 2019
InferLine: ML Inference Pipeline Composition Framework.
CoRR, 2018