2025
Jenga: Effective Memory Management for Serving LLM with Heterogeneity.
CoRR, March, 2025

SkyStore: Cost-Optimized Object Storage Across Regions and Clouds.
CoRR, February, 2025

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!
CoRR, February, 2025

2024
Pie: Pooling CPU Memory for LLM Inference.
CoRR, 2024

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
CoRR, 2024

Optimizing LLM Queries in Relational Workloads.
CoRR, 2024

Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

2023
RALF: Accuracy-Aware Scheduling for Feature Store Maintenance.
Proc. VLDB Endow., November, 2023

2022
Context-Aware Streaming Perception in Dynamic Environments.
Proceedings of the Computer Vision - ECCV 2022, 2022

2020
Towards Scalable Dataframe Systems.
Proc. VLDB Endow., 2020

InferLine: latency-aware provisioning and scaling for prediction serving pipelines.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
Pay Attention to Convolution Filters: Towards Fast and Accurate Fine-Grained Transfer Learning.
CoRR, 2019

The OoO VLIW JIT Compiler for GPU Inference.
CoRR, 2019

Dynamic Space-Time Scheduling for GPU Inference.
CoRR, 2019

2018
InferLine: ML Inference Pipeline Composition Framework.
CoRR, 2018