Amar Phanishayee

Orcid: 0009-0001-2777-1118

According to our database1, Amar Phanishayee authored at least 57 papers between 2006 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Data-driven Forecasting of Deep Learning Performance on GPUs.
CoRR, 2024

Workload-Aware Hardware Accelerator Mining for Distributed Deep Learning Training.
CoRR, 2024

Integrated Hardware Architecture and Device Placement Search.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

MGit: A Model Versioning and Management System.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Blox: A Modular Toolkit for Deep Learning Schedulers.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving.
CoRR, 2023

MGit: A Model Versioning and Management System.
CoRR, 2023

2022
Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers.
Proc. VLDB Endow., 2022

A Study on the Intersection of GPU Utilization and CNN Inference.
CoRR, 2022

Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021
Analyzing and Mitigating Data Stalls in DNN Training.
Proc. VLDB Endow., 2021

Synergy: Resource Sensitive DNN Scheduling in Multi-Tenant Clusters.
CoRR, 2021

Efficient Large-Scale Language Model Training on GPU Clusters.
CoRR, 2021

Efficient large-scale language model training on GPU clusters using megatron-LM.
Proceedings of the International Conference for High Performance Computing, 2021

Piper: Multidimensional Planner for DNN Parallelization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Memory-Efficient Pipeline-Parallel DNN Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

Boosting the Throughput and Accelerator Utilization of Specialized CNN Inference Beyond Increasing Batch Size.
Proceedings of the 38th International Conference on Machine Learning, 2021

Doing more with less: training large DNN models on commodity servers for the masses.
Proceedings of the HotOS '21: Workshop on Hot Topics in Operating Systems, 2021

CheckFreq: Frequent, Fine-Grained DNN Checkpointing.
Proceedings of the 19th USENIX Conference on File and Storage Technologies, 2021

2020
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training.
Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

Themis: Fair and Efficient GPU Cluster Scheduling.
Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

Efficient Algorithms for Device Placement of DNN Graph Operators.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Blink: Fast and Generic Collectives for Distributed ML.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

The Non-IID Data Quagmire of Decentralized Machine Learning.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Themis: Fair and Efficient GPU Cluster Scheduling for Machine Learning Workloads.
CoRR, 2019

Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

PipeDream: generalized pipeline parallelism for DNN training.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

The Case for Unifying Data Loading in Machine Learning Clusters.
Proceedings of the 11th USENIX Workshop on Hot Topics in Cloud Computing, 2019

2018
Compositional programming and testing of dynamic distributed systems.
Proc. ACM Program. Lang., 2018

PipeDream: Fast and Efficient Pipeline Parallel DNN Training.
CoRR, 2018

TBD: Benchmarking and Analyzing Deep Neural Network Training.
CoRR, 2018

Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training.
CoRR, 2018

Gist: Efficient Data Encoding for Deep Neural Network Training.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Benchmarking and Analyzing Deep Neural Network Training.
Proceedings of the 2018 IEEE International Symposium on Workload Characterization, 2018

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training.
Proceedings of the ACM Symposium on Cloud Computing, 2018

2017
RAIL: A Case for Redundant Arrays of Inexpensive Links in Data Center Networks.
Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, 2017

Atomic In-place Updates for Non-volatile Main Memories with Kamino-Tx.
Proceedings of the Twelfth European Conference on Computer Systems, 2017

2016
Beam: Ending Monolithic Applications for Connected Devices.
Proceedings of the 2016 USENIX Annual Technical Conference, 2016

ProjecToR: Agile Reconfigurable Data Center Interconnect.
Proceedings of the ACM SIGCOMM 2016 Conference, Florianopolis, Brazil, August 22-26, 2016, 2016

Evaluation of elastic modulation gains in microsoft's optical backbone in North America.
Proceedings of the Optical Fiber Communications Conference and Exhibition, 2016

2015
It's Time to End Monolithic Apps for Connected Devices.
login Usenix Mag., 2015

A Case for Ending Monolithic Apps for Connected Devices.
Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

2014
Bolt: Data Management for Connected Homes.
Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, 2014

2013
HomeLab: a platform for conducting experiments with connected devices in the home.
Proceedings of the ACM SIGCOMM 2013 Conference, 2013

Lab of things: a platform for conducting studies with connected devices in multiple homes.
Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2013

2011
FAWN: a fast array of wimpy nodes.
Commun. ACM, 2011

2009
Safe and effective fine-grained TCP retransmissions for datacenter communication.
Proceedings of the ACM SIGCOMM 2009 Conference on Applications, 2009

FAWNdamentally Power-efficient Clusters.
Proceedings of HotOS'09: 12th Workshop on Hot Topics in Operating Systems, 2009

Scaling all-pairs overlay routing.
Proceedings of the 2009 ACM Conference on Emerging Networking Experiments and Technology, 2009

2008
Ditto: a system for opportunistic caching in multi-hop wireless networks.
Proceedings of the 14th Annual International Conference on Mobile Computing and Networking, 2008

Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems.
Proceedings of the 6th USENIX Conference on File and Storage Technologies, 2008

2007
On application-level approaches to avoiding TCP throughput collapse in cluster-based storage systems.
Proceedings of the 2nd International Petascale Data Storage Workshop (PDSW '07), 2007

Ricochet: Lateral Error Correction for Time-Critical Multicast.
Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI 2007), 2007

Scalable Multicast Platforms for a New Generation of Robust Distributed Applications.
Proceedings of the Second International Conference on COMmunication System softWAre and MiddlewaRE (COMSWARE 2007), 2007

2006
PLATO: Predictive Latency-Aware Total Ordering.
Proceedings of the 25th IEEE Symposium on Reliable Distributed Systems (SRDS 2006), 2006


  Loading...