Yanghua Peng

Orcid: 0000-0003-3989-4358

According to our database1, Yanghua Peng authored at least 24 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2017
2018
2019
2020
2021
2022
2023
2024
0
1
2
3
4
5
6
7
8
9
4
2
1
1
1
4
1
3
2
2
2
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
HybridFlow: A Flexible and Efficient RLHF Framework.
CoRR, 2024

Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation.
CoRR, 2024

ByteCheckpoint: A Unified Checkpointing System for LLM Development.
CoRR, 2024

LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
CoRR, 2024

POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
Deep Learning-Based Job Placement in Distributed Machine Learning Clusters With Heterogeneous Workloads.
IEEE/ACM Trans. Netw., April, 2023

SP-GNN: Learning structure and position information from graphs.
Neural Networks, April, 2023

BGL: GPU-Efficient GNN Training by Optimizing Graph Data I/O and Preprocessing.
Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation, 2023

2022
dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training.
CoRR, 2022

Multi-resource interleaving for deep learning training.
Proceedings of the SIGCOMM '22: ACM SIGCOMM 2022 Conference, Amsterdam, The Netherlands, August 22, 2022

SAPipe: Staleness-Aware Pipeline for Data Parallel DNN Training.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

dPRO: A Generic Performance Diagnosis and Optimization Toolkit for Expediting Distributed DNN Training.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

2021
DL2: A Deep Learning-Driven Scheduler for Deep Learning Clusters.
IEEE Trans. Parallel Distributed Syst., 2021

2020
Preemptive All-reduce Scheduling for Expediting Distributed DNN Training.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020

Elastic parameter server load distribution in deep learning clusters.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
A generic communication scheduler for distributed DNN training acceleration.
Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019

Deep Learning-based Job Placement in Distributed Machine Learning Clusters.
Proceedings of the 2019 IEEE Conference on Computer Communications, 2019

2018
Online Job Scheduling in Distributed Machine Learning Clusters.
Proceedings of the 2018 IEEE Conference on Computer Communications, 2018

Optimus: an efficient dynamic resource scheduler for deep learning clusters.
Proceedings of the Thirteenth EuroSys Conference, 2018

2017
Dynamic Scaling of Virtualized, Distributed Service Chains: A Case Study of IMS.
IEEE J. Sel. Areas Commun., 2017

deTector: a Topology-aware Monitoring System for Data Center Networks.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017


  Loading...