Zhang Zhang

Orcid: 0009-0001-4427-7015

Affiliations:
  • ByteDance


According to our database1, Zhang Zhang authored at least 4 papers in 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Minder: Faulty Machine Detection for Large-scale Distributed Model Training.
CoRR, 2024

R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic System.
Proceedings of the ACM SIGCOMM 2024 Conference, 2024

MegaScale: Scaling Large Language Model Training to More Than 10, 000 GPUs.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Hostmesh: Monitor and Diagnose Networks in Rail-optimized RoCE Clusters.
Proceedings of the 8th Asia-Pacific Workshop on Networking, 2024


  Loading...