2025

Jenga: Effective Memory Management for Serving LLM with Heterogeneity.

[DOI]

Chen Zhang

Kuntai Du

CoRR, March, 2025

SkyStore: Cost-Optimized Object Storage Across Regions and Clouds.

[DOI]

CoRR, February, 2025

LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!

[DOI]

CoRR, February, 2025

2024

Pie: Pooling CPU Memory for LLM Inference.

[DOI]

CoRR, 2024

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.

[DOI]

CoRR, 2024

Optimizing LLM Queries in Relational Workloads.

[DOI]

CoRR, 2024

Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud.

[DOI]

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

2023

RALF: Accuracy-Aware Scheduling for Feature Store Maintenance.

[DOI]

Joseph M. Hellerstein

Natacha Crooks

Joseph E. Gonzalez

Proc. VLDB Endow., November, 2023

2022

Context-Aware Streaming Perception in Dynamic Environments.

[DOI]

Gur-Eyal Sela

Ionel Gog

Justin Wong

Kumar Krishna Agrawal

Proceedings of the Computer Vision - ECCV 2022, 2022

2020

Towards Scalable Dataframe Systems.

[DOI]

Joseph M. Hellerstein

Anthony D. Joseph

Aditya G. Parameswaran

Proc. VLDB Endow., 2020

InferLine: latency-aware provisioning and scaling for prediction serving pipelines.

[DOI]

Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019

Pay Attention to Convolution Filters: Towards Fast and Accurate Fine-Grained Transfer Learning.

[DOI]

Xiangxi Mo

Ruizhe Cheng

Tianyi Fang

CoRR, 2019

The OoO VLIW JIT Compiler for GPU Inference.

[DOI]

CoRR, 2019

Dynamic Space-Time Scheduling for GPU Inference.

[DOI]

CoRR, 2019

2018

InferLine: ML Inference Pipeline Composition Framework.

[DOI]

CoRR, 2018