Gregory R. Ganger

Orcid: 0000-0002-3065-7316

  • Carnegie Mellon University, Pittsburgh, USA

According to our database1, Gregory R. Ganger authored at least 171 papers between 1993 and 2024.

Collaborative distances:


IEEE Fellow

IEEE Fellow 2011, "For contributions to metadata integrity in file systems".



In proceedings 
PhD thesis 


Online presence:



PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training.
CoRR, 2024

GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism.
CoRR, 2024

Reducing Cross-Cloud/Region Costs with the Auto-Configuring MACARON Cache.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

Morph: Efficient File-Lifetime Redundancy Management for Cluster File Systems.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

FairyWREN: A Sustainable Cache for Emerging Write-Read-Erase Flash Interfaces.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Baleen: ML Admission & Prefetching for Flash Caches.
Proceedings of the 22nd USENIX Conference on File and Storage Technologies, 2024

Extending and Programming the NVMe I/O Determinism Interface for Flash Arrays.
ACM Trans. Storage, February, 2023

Mimir: Finding Cost-efficient Storage Configurations in the Public Cloud.
Proceedings of the 16th ACM International Conference on Systems and Storage, 2023

Sia: Heterogeneity-aware, goodput-optimized ML-cluster scheduling.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

RAIZN: Redundant Array of Independent Zoned Namespaces.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on Flash.
ACM Trans. Storage, 2022

Tiger: Disk-Adaptive Redundancy Without Placement Restrictions.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

ZNS: Avoiding the Block Interface Tax for Flash-based SSDs.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Kangaroo: Caching Billions of Tiny Objects on Flash.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

lODA: A Host/Device Co-Design for Strong Predictability Contract on Modern Flash Storage.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

WineFS: a hugepage-aware file system for persistent memory that ages gracefully.
Proceedings of the SOSP '21: ACM SIGOPS 28th Symposium on Operating Systems Principles, 2021

DeltaFS: a scalable no-ground-truth filesystem for massively-parallel computing.
Proceedings of the International Conference for High Performance Computing, 2021

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

File Systems Unfit as Distributed Storage Back Ends: Lessons from 10 Years of Ceph Evolution.
login Usenix Mag., 2020

Streaming Data Reorganization at Scale with DeltaFS Indexed Massive Directories.
ACM Trans. Storage, 2020

The Case for Custom Storage Backends in Distributed Storage Systems.
ACM Trans. Storage, 2020

Mochi: Composing Data Services for High-Performance Computing Environments.
J. Comput. Sci. Technol., 2020

Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning.
CoRR, 2020

Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM.
CoRR, 2020

PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Unearthing inter-job dependencies for better cluster scheduling.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

The CacheLib Caching Engine: Design and Experiences at Scale.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

TVARAK: Software-Managed Hardware Offload for Redundancy in Direct-Access NVM Storage.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

More IOPS for Less: Exploiting Burstable Storage in Public Clouds.
Proceedings of the 12th USENIX Workshop on Hot Topics in Cloud Computing, 2020

High availability in cheap distributed key value storage.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

Accelerating Deep Learning by Focusing on the Biggest Losers.
CoRR, 2019

Tvarak: Software-managed hardware offload for DAX NVM storage redundancy.
CoRR, 2019

SysML: The New Frontier of Machine Learning Systems.
CoRR, 2019

PipeDream: generalized pipeline parallelism for DNN training.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Peering through the Dark: An Owl's View of Inter-job Dependencies and Jobs' Impact in Shared Clusters.
Proceedings of the 2019 International Conference on Management of Data, 2019

Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity.
Proceedings of the 17th USENIX Conference on File and Storage Technologies, 2019

Compact Filters for Fast Online Data Partitioning.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

The Atlas Cluster Trace Repository.
login Usenix Mag., 2018

PipeDream: Fast and Efficient Pipeline Parallel DNN Training.
CoRR, 2018

MLtuner: System Support for Automatic Machine Learning Tuning.
CoRR, 2018

Geriatrix: Aging what you see and what you don't see. A file system aging approach for modern storage systems.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Tributary: spot-dancing for elastic services with latency SLOs.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

On the diversity of cluster workloads and its impact on research results.
Proceedings of the 2018 USENIX Annual Technical Conference, 2018

Scaling embedded in-situ indexing with deltaFS.
Proceedings of the International Conference for High Performance Computing, 2018

A Case for Packing and Indexing in Cloud File Systems.
Proceedings of the 10th USENIX Workshop on Hot Topics in Cloud Computing, 2018

3Sigma: distribution-based cluster scheduling for runtime uncertainty.
Proceedings of the Thirteenth EuroSys Conference, 2018

Stratus: cost-aware container scheduling in the public cloud.
Proceedings of the ACM Symposium on Cloud Computing, 2018

Online Deduplication for Databases.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Viyojit: Decoupling Battery and DRAM Capacities for Battery-Backed DRAM.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Proteus: agile ML elasticity through tiered reliability in dynamic resource markets.
Proceedings of the Twelfth European Conference on Computer Systems, 2017

On IO Latency Prediction Accuracy and Automated Load Balancing in Consolidated VM Environments.
Proceedings of the 2016 IEEE International Conference on Cloud Engineering, 2016

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server.
Proceedings of the Eleventh European Conference on Computer Systems, 2016

Principled workflow-centric tracing of distributed systems.
Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016

Addressing the straggler problem for iterative convergent parallel ML.
Proceedings of the Seventh ACM Symposium on Cloud Computing, 2016

Reducing replication bandwidth for distributed document databases.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Managed communication and consistency for fast data-parallel iterative analytics.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Using data transformations for low-latency time series analysis.
Proceedings of the Sixth ACM Symposium on Cloud Computing, 2015

Agility and Performance in Elastic Distributed Storage.
ACM Trans. Storage, 2014

Exploiting Bounded Staleness to Speed Up Big Data Analytics.
Proceedings of the 2014 USENIX Annual Technical Conference, 2014

SpringFS: bridging agility and performance in elastic distributed storage.
Proceedings of the 12th USENIX conference on File and Storage Technologies, 2014

Toward strong, usable access control for shared distributed data.
Proceedings of the 12th USENIX conference on File and Storage Technologies, 2014

PriorityMeister: Tail Latency QoS for Shared Networked Storage.
Proceedings of the ACM Symposium on Cloud Computing, 2014

Exploiting iterative-ness for parallel ML computations.
Proceedings of the ACM Symposium on Cloud Computing, 2014

Visualizing Request-Flow Comparison to Aid Performance Diagnosis in Distributed Systems.
IEEE Trans. Vis. Comput. Graph., 2013

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Active disk meets flash: a case for intelligent SSDs.
Proceedings of the International Conference on Supercomputing, 2013

Specialized Storage for Big Numeric Time Series.
Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, 2013

Solving the Straggler Problem with Bounded Staleness.
Proceedings of the 14th Workshop on Hot Topics in Operating Systems, 2013

File system virtual appliances: Portable file system implementations.
ACM Trans. Storage, 2012

RainMon: an integrated approach to mining bursty timeseries monitoring data.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012

Automated Diagnosis Without Predictability Is a Recipe for Failure.
Proceedings of the 4th USENIX Workshop on Hot Topics in Cloud Computing, 2012

LazyBase: trading freshness for performance in a scalable database.
Proceedings of the European Conference on Computer Systems, 2012

alsched: algebraic scheduling of mixed workloads in heterogeneous clouds.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

Heterogeneity and dynamicity of clouds at scale: Google trace analysis.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '12, 2012

Applying idealized lower-bound runtime models to understand inefficiencies in data-intensive computing.
Proceedings of the SIGMETRICS 2011, 2011

Diagnosing Performance Changes by Comparing Request Flows.
Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation, 2011

Disks Are Like Snowflakes: No Two Are Alike.
Proceedings of the 13th Workshop on Hot Topics in Operating Systems, 2011

Exertion-based Billing for Cloud Storage Access.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Cloud Computing, 2011

Storage-Based Intrusion Detection.
ACM Trans. Inf. Syst. Secur., 2010

Open Cirrus: A Global Cloud Computing Testbed.
Computer, 2010

A Transparently-Scalable Metadata Service for the Ursa Minor Storage System.
Proceedings of the 2010 USENIX Annual Technical Conference, 2010

Zzyzx: Scalable fault tolerance through Byzantine locking.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010

Robust and flexible power-proportional storage.
Proceedings of the 1st ACM Symposium on Cloud Computing, 2010

Access control for home data sharing: evaluating social acceptability.
Proceedings of the 28th International Conference on Human Factors in Computing Systems, 2010

Perspective: Semantic Data Management for the Home.
login Usenix Mag., 2009

Relative fitness modeling.
Commun. ACM, 2009

Co-scheduling of Disk Head Time in Cluster-Based Storage.
Proceedings of the 28th IEEE Symposium on Reliable Distributed Systems (SRDS 2009), 2009

Safe and effective fine-grained TCP retransmissions for datacenter communication.
Proceedings of the ACM SIGCOMM 2009 Conference on Applications, 2009

Tashi: location-aware cluster management.
Proceedings of the 1st Workshop on Automated Control for Datacenters and Clouds, 2009

In Search of an API for Scalable File Systems: Under the Table or Above It?
Proceedings of the Workshop on Hot Topics in Cloud Computing, 2009

Ironmodel: robust performance models in the wild.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Using Utility to Provision Storage Systems.
Proceedings of the 6th USENIX Conference on File and Storage Technologies, 2008

Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems.
Proceedings of the 6th USENIX Conference on File and Storage Technologies, 2008

Using Provenance to Aid in Personal File Search.
Proceedings of the 2007 USENIX Annual Technical Conference, 2007

Low-overhead byzantine fault-tolerant storage.
Proceedings of the 21st ACM Symposium on Operating Systems Principles 2007, 2007

Modeling the relative fitness of storage.
Proceedings of the 2007 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2007

On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems.
Proceedings of the 2nd International Petascale Data Storage Workshop (PDSW '07), 2007

Verifying distributed erasure-coded data.
Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, 2007

MultiMap: Preserving disk locality for multidimensional datasets.
Proceedings of the 23rd International Conference on Data Engineering, 2007

Argon: Performance Insulation for Shared Storage Servers.
Proceedings of the 5th USENIX Conference on File and Storage Technologies, 2007

//TRACE: Parallel Trace Replay with Approximate Causal Events.
Proceedings of the 5th USENIX Conference on File and Storage Technologies, 2007

InteMon: continuous mining of sensor data in large-scale self-infrastructures.
ACM SIGOPS Oper. Syst. Rev., 2006

Relative fitness models for storage.
SIGMETRICS Perform. Evaluation Rev., 2006

Towards self-predicting systems: What if you could ask 'what-if'?
Knowl. Eng. Rev., 2006

Early experiences on the journey towards self-* storage.
IEEE Data Eng. Bull., 2006

Stardust: tracking activity in a distributed storage system.
Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2006

Informed data distribution selection in a self-predicting storage system.
Proceedings of the 3rd International Conference on Autonomic Computing, 2006

Intelligent system monitoring on large clusters.
Proceedings of the 3rd Workshop on Data Management for Sensor Networks, 2006

Towards bounded wait-free PASIS.
Proceedings of the From Security to Dependability, 10.09. - 15.09.2006, 2006

Log-based architectures for general-purpose monitoring of deployed code.
Proceedings of the 1st Workshop on Architectural and System Support for Improving Software Dependability, 2006

Comparison-Based File Server Verification.
Proceedings of the 2005 USENIX Annual Technical Conference, 2005

Lazy Verification in Fault-Tolerant Distributed Storage Systems.
Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems (SRDS 2005), 2005

Connections: using context to enhance file search.
Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, 2005

Fault-scalable Byzantine fault-tolerant services.
Proceedings of the 20th ACM Symposium on Operating Systems Principles 2005, 2005

Scheduling speculative tasks in a compute farm.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Replication policies for layered clustering of NFS servers.
Proceedings of the 13th International Symposium on Modeling, 2005

On Multidimensional Data and Modern Disks.
Proceedings of the FAST '05 Conference on File and Storage Technologies, 2005

Clotho: Decoupling memory page layout from storage organization.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

Storage Device Performance Prediction with CART Models.
Proceedings of the 12th International Workshop on Modeling, 2004

Cluster scheduling for explicitly-speculative tasks.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

File Classification in Self-* Storage Systems.
Proceedings of the 1st International Conference on Autonomic Computing (ICAC 2004), 2004

A Framework for Building Unobtrusive Disk Maintenance Applications (Awarded Best Student Paper!).
Proceedings of the FAST '04 Conference on File and Storage Technologies, March 31, 2004

MEMS-based Storage Devices and Standard Disk Interfaces: A Square Peg in a Round Hole?
Proceedings of the FAST '04 Conference on File and Storage Technologies, March 31, 2004

Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks.
Proceedings of the FAST '04 Conference on File and Storage Technologies, March 31, 2004

Diamond: A Storage Architecture for Early Discard in Interactive Search.
Proceedings of the FAST '04 Conference on File and Storage Technologies, March 31, 2004

Dynamic Quarantine of Internet Worms.
Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June, 2004

Efficient Byzantine-Tolerant Erasure-Coded Storage.
Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN 2004), 28 June, 2004

Enabling autonomic behavior in systems software with hot swapping.
IBM Syst. J., 2003

Object-based storage.
IEEE Commun. Mag., 2003

Lachesis: Robust Database Storage Management Based on Device-specific Performance Characteristics.
Proceedings of 29th International Conference on Very Large Data Bases, 2003

Storage-based Intrusion Detection: Watching Storage Activity for Suspicious Behavior.
Proceedings of the 12th USENIX Security Symposium, Washington, D.C., USA, August 4-8, 2003, 2003

System Support for Online Reconfiguration.
Proceedings of the General Track: 2003 USENIX Annual Technical Conference, 2003

Why Can't I Find My Files? New Methods for Automating Attribute Assignment.
Proceedings of HotOS'03: 9th Workshop on Hot Topics in Operating Systems, 2003

Metadata Efficiency in Versioning File Systems.
Proceedings of the FAST '03 Conference on File and Storage Technologies, March 31, 2003

Fast and flexible application-level networking on exokernel systems.
ACM Trans. Comput. Syst., 2002

Track-Aligned Extents: Matching Access Patterns to Disk Drive Characteristics.
Proceedings of the FAST '02 Conference on File and Storage Technologies, 2002

Freeblock Scheduling Outside of Disk Firmware.
Proceedings of the FAST '02 Conference on File and Storage Technologies, 2002

Timing-Accurate Storage Emulation.
Proceedings of the FAST '02 Conference on File and Storage Technologies, 2002

Hinting for Goodness' Sake.
Proceedings of HotOS-VIII: 8th Workshop on Hot Topics in Operating Systems, 2001

Better Security via Smarter Devices.
Proceedings of HotOS-VIII: 8th Workshop on Hot Topics in Operating Systems, 2001

Authentication Confidences.
Proceedings of HotOS-VIII: 8th Workshop on Hot Topics in Operating Systems, 2001

Soft updates: a solution to the metadata update problem in file systems.
ACM Trans. Comput. Syst., 2000

Survivable Information Storage Systems.
Computer, 2000

MEMS-based integrated-circuit mass-storage systems.
Commun. ACM, 2000

Journaling Versus Soft Updates: Asynchronous Meta-data Protection in File Systems.
Proceedings of the General Track: 2000 USENIX Annual Technical Conference, 2000

Dynamic Function Placement for Data-Intensive Cluster Computing.
Proceedings of the General Track: 2000 USENIX Annual Technical Conference, 2000

Easing the management of data-parallel systems via adaptation.
Proceedings of the 9th ACM SIGOPS European Workshop, 2000

Data Mining on an OLTP System (Nearly) for Free.
Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000

Automated disk drive characterization (poster).
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2000

Modeling and performance of MEMS-based storage devices.
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2000

Self-Securing Storage: Protecting Data in Compromised Systems.
Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

Towards Higher Disk Head Utilization: Extracting "Free" Bandwidth from Busy Disk Drives.
Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

Operating System Management of MEMS-based Storage Devices.
Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), 2000

Designing computer systems with MEMS-based storage.
Proceedings of the ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000

Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem.
Proceedings of the FREENIX Track: 1999 USENIX Annual Technical Conference, 1999

Using System-Level Models to Evaluate I/O Subsystem Designs.
IEEE Trans. Computers, 1998

Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files.
Proceedings of the 1997 USENIX Annual Technical Conference, 1997

Application Performance and Flexibility on Exokernel Systems.
Proceedings of the Sixteenth ACM Symposium on Operating System Principles, 1997

Server operating systems.
Proceedings of the 7th ACM SIGOPS European Workshop: Systems Support for Worldwide Applications, 1996

System-oriented evaluation of I/O subsystem performance.
PhD thesis, 1995

On-Line Extraction of SCSI Disk Drive Parameters.
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, 1995

Fellowship Winner: Generating Representative Synthetic Workloads: An Unsolved Problem.
Proceedings of the 21st International Computer Measurement Group Conference, 1995

Disk Arrays: High-Performance, High-Reliability Storage Subsystems.
Computer, 1994

Scheduling Algorithms for Modern Disk Drives.
Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1994

Metadata Update Performance in File Systems.
Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1994

The Process-Flow Model: Examining I/O Performance from the System's Point of View.
Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems, 1993
