Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters.
Dataset, January, 2024
Optimal Resource Efficiency with Fairness in Heterogeneous GPU Clusters.
Proceedings of the 25th International Middleware Conference, 2024
Derm: SLA-aware Resource Management for Highly Dynamic Microservices.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024
Heet: Accelerating Elastic Training in Heterogeneous Deep Learning Clusters.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach.
Dataset, June, 2023
Interference-aware Multiplexing for Deep Learning in GPU Clusters: A Middleware Approach.
Proceedings of the International Conference for High Performance Computing, 2023