Shivaram Venkataraman

Orcid: 0000-0001-9575-7935

Affiliations:
  • University of Wisconsin-Madison, WI, USA


According to our database1, Shivaram Venkataraman authored at least 82 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters.
CoRR, 2024

GraphSnapShot: Graph Machine Learning Acceleration with Fast Storage and Retrieval.
CoRR, 2024

Decoding Speculative Decoding.
CoRR, 2024

Does Compressing Activations Help Model Parallel Training?
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

CHAI: Clustered Head Attention for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Blox: A Modular Toolkit for Deep Learning Schedulers.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Nautilus: A Benchmarking Platform for DBMS Knob Tuning.
Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning, 2024

2023
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices.
CoRR, 2023

F2: Designing a Key-Value Store for Large Skewed Workloads.
CoRR, 2023

Bagpipe: Accelerating Deep Recommendation Model Training.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Mirage: Towards Low-interruption Services on Batch GPU Clusters with Reinforcement Learning.
Proceedings of the International Conference for High Performance Computing, 2023

Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine Learning.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks.
Proceedings of the Eighteenth European Conference on Computer Systems, 2023

2022
LlamaTune: Sample-Efficient DBMS Configuration Tuning.
Proc. VLDB Endow., 2022

BagPipe: Accelerating Deep Recommendation Model Training.
CoRR, 2022

Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine.
CoRR, 2022

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

On the Utility of Gradient Compression in Distributed Training Systems.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021
The Roaming Edge and its Applications.
GetMobile Mob. Comput. Commun., 2021

Demonstration of Marius: Graph Embeddings with a Single Machine.
Proc. VLDB Endow., 2021

AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuning.
CoRR, 2021

Learning Massive Graph Embeddings on a Single Machine.
CoRR, 2021

Accelerating Deep Learning Inference via Learned Caches.
CoRR, 2021

KAISA: an adaptive second-order optimizer framework for deep neural networks.
Proceedings of the International Conference for High Performance Computing, 2021

Marius: Learning Massive Graph Embeddings on a Single Machine.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo.
Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, 2021

Adaptive Gradient Communication via Critical Learning Regime Identification.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Doing more by doing less: how structured partial backpropagation improves deep learning clusters.
Proceedings of the DistributedML '21: Proceedings of the 2nd ACM International Workshop on Distributed Machine Learning, 2021

Atoll: A Scalable Low-Latency Serverless Platform.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020
Learning-Based Coded Computation.
IEEE J. Sel. Areas Inf. Theory, 2020

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification.
CoRR, 2020

Themis: Fair and Efficient GPU Cluster Scheduling.
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

Blink: Fast and Generic Collectives for Distributed ML.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Too Many Knobs to Tune? Towards Faster Database Tuning by Pre-selecting Important Knobs.
Proceedings of the 12th USENIX Workshop on Hot Topics in Storage and File Systems, 2020

Serverless linear algebra.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
Archipelago: A Scalable Low-Latency Serverless Platform.
CoRR, 2019

Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads.
CoRR, 2019

Parity Models: A General Framework for Coding-Based Resilience in ML Inference.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

Parity models: erasure-coded resilience for prediction serving systems.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure.
Proceedings of the 16th USENIX Symposium on Networked Systems Design and Implementation, 2019

Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary.
Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, 2019

Accelerating Deep Learning Inference via Freezing.
Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing, 2019

The Case for Unifying Data Loading in Machine Learning Clusters.
Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing, 2019

Serverless Event-Stream Processing over Virtual Actors.
Proceedings of the 9th Biennial Conference on Innovative Data Systems Research, 2019

2018
Chi: A Scalable and Programmable Control Plane for Distributed Stream Processing Systems.
Proc. VLDB Endow., 2018

numpywren: serverless linear algebra.
CoRR, 2018

Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation.
CoRR, 2018

ASAP: Fast, Approximate Graph Pattern Mining at Scale.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Focus: Querying Large Video Datasets with Low Latency and Low Cost.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Towards Fast and Scalable Graph Pattern Mining.
Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

Bridging the GAP: towards approximate graph analytics.
Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2018

2017
System Design for Large Scale Machine Learning.
PhD thesis, 2017

Hemingway: Modeling Distributed Optimization Algorithms.
CoRR, 2017

Occupy the Cloud: Distributed Computing for the 99%.
CoRR, 2017

Drizzle: Fast and Adaptable Stream Processing at Scale.
Proceedings of the 26th Symposium on Operating Systems Principles, 2017

CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Breaking Locality Accelerates Block Gauss-Seidel.
Proceedings of the 34th International Conference on Machine Learning, 2017

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Occupy the cloud: distributed computing for the 99%.
Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

2016
MLlib: Machine Learning in Apache Spark.
J. Mach. Learn. Res., 2016

Large Scale Kernel Learning using Block Coordinate Descent.
CoRR, 2016

Apache Spark: a unified engine for big data processing.
Commun. ACM, 2016

SparkR: Scaling R Programs with Spark.
Proceedings of the 2016 International Conference on Management of Data, 2016

Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics.
Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation, 2016

Matrix Computations and Optimization in Apache Spark.
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016

2015
linalg: Matrix Computations in Apache Spark.
CoRR, 2015

2014
Quantifying eventual consistency with PBS.
Commun. ACM, 2014

Record Placement Based on Data Skew Using Solid State Drives.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

The Power of Choice in Data-Aware Cluster Scheduling.
Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, 2014

2013
PBS at work: advancing data management with consistency metrics.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

The Case for Tiny Tasks in Compute Clusters.
Proceedings of the 14th Workshop on Hot Topics in Operating Systems, 2013

Presto: distributed machine learning and graph processing with sparse matrices.
Proceedings of the Eighth Eurosys Conference 2013, 2013

2012
Probabilistically Bounded Staleness for Practical Partial Quorums.
Proc. VLDB Endow., 2012

Sweet Storage SLOs with Frosting.
Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

Using R for Iterative and Incremental Processing.
Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

Cake: enabling high-level SLOs on shared storage systems.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

2011
Characterizing Data Structures for Volatile Forensics.
Proceedings of the 2011 IEEE Sixth International Workshop on Systematic Approaches to Digital Forensic Engineering, 2011

Consistent and Durable Data Structures for Non-Volatile Byte-Addressable Memory.
Proceedings of the 9th USENIX Conference on File and Storage Technologies, 2011

2010
Scaling eCGA model building via data-intensive computing.
Proceedings of the IEEE Congress on Evolutionary Computation, 2010

Forenscope: a framework for live forensics.
Proceedings of the Twenty-Sixth Annual Computer Security Applications Conference, 2010


  Loading...