Peng Sun
Orcid: 0000-0001-8456-0491Affiliations:
- SenseTime Research, China
- Shanghai AI Laboratory, Shanghai, China
- Nanyang Technological University, Energy Research Institute, Interdisciplinary Graduate School, Singapore
According to our database1,
Peng Sun
authored at least 44 papers
between 2013 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
UniSched: A Unified Scheduler for Deep Learning Training Jobs With Different User Demands.
IEEE Trans. Computers, June, 2024
ACM Comput. Surv., June, 2024
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey.
CoRR, 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024
CoRR, 2024
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding.
CoRR, 2024
FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices.
Proceedings of the ACM on Web Conference 2024, 2024
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024
Proceedings of the International Conference for High Performance Computing, 2024
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024
Proceedings of the 32nd IEEE/ACM International Symposium on Quality of Service, 2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Sylvie: 3D-Adaptive and Universal System for Large-Scale Graph Neural Network Training.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024
Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication.
CoRR, 2023
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023
2022
IEEE Trans. Parallel Distributed Syst., 2022
GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training.
IEEE Trans. Big Data, 2022
Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision.
CoRR, 2022
A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs.
CoRR, 2022
Proceedings of the 2022 USENIX Annual Technical Conference, 2022
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022
2021
Characterization and prediction of deep learning workloads in large-scale GPU datacenters.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021
2020
IEEE Trans. Big Data, 2020
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
2019
Proceedings of the Encyclopedia of Big Data Technologies., 2019
Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes.
CoRR, 2019
2018
MetaFlow: A Scalable Metadata Lookup Service for Distributed File Systems in Data Centers.
IEEE Trans. Big Data, 2018
On Distributed Algorithms for Cost-Efficient Data Center Placement in Cloud Computing.
CoRR, 2018
Speeding-Up Age Estimation in Intelligent Demographics System via Network Optimization.
Proceedings of the 2018 IEEE International Conference on Communications, 2018
2017
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.
Proceedings of the 2017 IEEE International Conference on Smart Computing, 2017
GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
2016
Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
2014
Proceedings of the 2014 IFIP Networking Conference, Trondheim, 2014
2013
Proceedings of the ACM SIGCOMM 2013 Conference, 2013