2025
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow.
Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2025
2024
Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs.
CoRR, 2024
Communication-efficient, Fault Tolerant PIR over Erasure Coded Storage.
Proceedings of the IEEE Symposium on Security and Privacy, 2024
Morph: Efficient File-Lifetime Redundancy Management for Cluster File Systems.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024
SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
On Low Field Size Constructions of Access-Optimal Convertible Codes.
Proceedings of the IEEE International Symposium on Information Theory, 2024
Rethinking Erasure-Coding Libraries in the Age of Optimized Machine Learning.
Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems, 2024
2023
Bandwidth Cost of Code Conversions in Distributed Storage: Fundamental Limits and Optimal Constructions.
IEEE Trans. Inf. Theory, August, 2023
Online Versus Offline Rate in Streaming Codes for Variable-Size Messages.
IEEE Trans. Inf. Theory, June, 2023
Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding.
Proc. VLDB Endow., 2023
FIFO queues are all you need for cache eviction.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023
Tambur: Efficient loss recovery for videoconferencing via streaming codes.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023
Learning-augmented streaming codes for variable-size messages under partial burst losses.
Proceedings of the IEEE International Symposium on Information Theory, 2023
On expanding the toolkit of locality-based coded computation to the coordinates of inputs.
Proceedings of the IEEE International Symposium on Information Theory, 2023
Compression-Informed Coded Computing.
Proceedings of the IEEE International Symposium on Information Theory, 2023
Locally Repairable Convertible Codes: Erasure Codes for Efficient Repair and Conversion.
Proceedings of the IEEE International Symposium on Information Theory, 2023
FIFO can be Better than LRU: the Power of Lazy Promotion and Quick Demotion.
Proceedings of the 19th Workshop on Hot Topics in Operating Systems, 2023
GL-Cache: Group-level learning for efficient and high-performance caching.
Proceedings of the 21st USENIX Conference on File and Storage Technologies, 2023
2022
Streaming Codes for Variable-Size Messages.
IEEE Trans. Inf. Theory, 2022
Convertible Codes: Enabling Efficient Conversion of Coded Data in Distributed Storage.
IEEE Trans. Inf. Theory, 2022
Tiger: Disk-Adaptive Redundancy Without Placement Restrictions.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022
C2DN: How to Harness Erasure Codes at the Edge for Efficient Content Delivery.
Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, 2022
Learning-Augmented Streaming Codes are Approximately Optimal for Variable-Size Messages.
Proceedings of the IEEE International Symposium on Information Theory, 2022
Bandwidth Cost of Code Conversions in the Split Regime.
Proceedings of the IEEE International Symposium on Information Theory, 2022
2021
A Large-scale Analysis of Hundreds of In-memory Key-value Cache Clusters at Twitter.
ACM Trans. Storage, 2021
ECRM: Efficient Fault Tolerance for Recommendation Model Training via Erasure Coding.
CoRR, 2021
Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs.
Proceedings of the International Conference for High Performance Computing, 2021
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects.
Proceedings of the 18th USENIX Symposium on Networked Systems Design and Implementation, 2021
A locality-based lens for coded computation.
Proceedings of the IEEE International Symposium on Information Theory, 2021
Irregular Array Codes with Arbitrary Access Sets for Geo-Distributed Storage.
Proceedings of the IEEE International Symposium on Information Theory, 2021
Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size.
Proceedings of the 38th International Conference on Machine Learning, 2021
2020
Learning-Based Coded Computation.
IEEE J. Sel. Areas Inf. Theory, 2020
A locality-based approach for coded computation.
CoRR, 2020
A large scale analysis of hundreds of in-memory cache clusters at Twitter.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020
PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020
Access-optimal Linear MDS Convertible Codes for All Parameters.
Proceedings of the IEEE International Symposium on Information Theory, 2020
Convertible Codes: New Class of Codes for Efficient Conversion of Coded Data in Distributed Storage.
Proceedings of the 11th Innovations in Theoretical Computer Science Conference, 2020
2019
Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage.
CoRR, 2019
Parity Models: A General Framework for Coding-Based Resilience in ML Inference.
CoRR, 2019
SysML: The New Frontier of Machine Learning Systems.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2019
Parity models: erasure-coded resilience for prediction serving systems.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019
Vantage: optimizing video upload for time-shifted viewing of social live streams.
Proceedings of the ACM Special Interest Group on Data Communication, 2019
Cluster storage systems gotta have HeART: improving storage efficiency by exploiting disk-reliability heterogeneity.
Proceedings of the 17th USENIX Conference on File and Storage Technologies, 2019
2018
Information-Theoretically Secure Erasure Codes for Distributed Storage.
IEEE Trans. Inf. Theory, 2018
Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation.
CoRR, 2018
Streaming Codes For Variable-Size Arrivals.
Proceedings of the 56th Annual Allerton Conference on Communication, 2018
2017
A Piggybacking Design Framework for Read-and Download-Efficient Distributed Storage Codes.
IEEE Trans. Inf. Theory, 2017
2016
Erasure Coding for Big-data Systems: Theory and Practice
PhD thesis, 2016
EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016
Optimal systematic distributed storage codes with fast encoding.
Proceedings of the IEEE International Symposium on Information Theory, 2016
2015
Distributed Secret Dissemination Across a Network.
IEEE J. Sel. Top. Signal Process., 2015
Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth.
Proceedings of the 13th USENIX Conference on File and Storage Technologies, 2015
DART: Dropouts meet Multiple Additive Regression Trees.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015
2014
A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers.
Proceedings of the ACM SIGCOMM 2014 Conference, 2014
One extra bit of download ensures perfectly private information retrieval.
Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, June 29, 2014
Optimality of the product-matrix construction for secure MSR regenerating codes.
Proceedings of the 6th International Symposium on Communications, 2014
Fundamental limits on communication for oblivious updates in storage networks.
Proceedings of the IEEE Global Communications Conference, 2014
2013
Secure network coding for distributed secret sharing with low communication cost.
Proceedings of the 2013 IEEE International Symposium on Information Theory, 2013
A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster.
Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, 2013
2012
Interference Alignment in Regenerating Codes for Distributed Storage: Necessity and Code Constructions.
IEEE Trans. Inf. Theory, 2012
Distributed Storage Codes With Repair-by-Transfer and Nonachievability of Interior Points on the Storage-Bandwidth Tradeoff.
IEEE Trans. Inf. Theory, 2012
Secret Share Dissemination across a Network
CoRR, 2012
Regenerating codes for errors and erasures in distributed storage.
Proceedings of the 2012 IEEE International Symposium on Information Theory, 2012
2011
Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction.
IEEE Trans. Inf. Theory, 2011
Enabling node repair in any erasure code for distributed storage.
Proceedings of the 2011 IEEE International Symposium on Information Theory Proceedings, 2011
Information-Theoretically Secure Regenerating Codes for Distributed Storage.
Proceedings of the Global Communications Conference, 2011
2010
Distributed Storage Codes with Repair-by-Transfer and Non-achievability of Interior Points on the Storage-Bandwidth Tradeoff
CoRR, 2010
The MISER Code: An MDS Distributed Storage Code that Minimizes Repair Bandwidth for Systematic Nodes through Interference Alignment
CoRR, 2010
Regenerating Codes for Distributed Storage Networks.
Proceedings of the Arithmetic of Finite Fields, Third International Workshop, 2010
Explicit and optimal codes for distributed storage.
Proceedings of the Information Theory and Applications Workshop, 2010
A flexible class of regenerating codes for distributed storage.
Proceedings of the IEEE International Symposium on Information Theory, 2010
Explicit and optimal exact-regenerating codes for the minimum-bandwidth point in distributed storage.
Proceedings of the IEEE International Symposium on Information Theory, 2010
2009
Explicit Codes Minimizing Repair Bandwidth for Distributed Storage
CoRR, 2009
Exact Regenerating Codes for Distributed Storage
CoRR, 2009
Explicit construction of optimal exact regenerating codes for distributed storage.
Proceedings of the 47th Annual Allerton Conference on Communication, 2009