Peng Sun

Orcid: 0000-0001-8456-0491

Affiliations:
  • SenseTime Research, China
  • Shanghai AI Laboratory, Shanghai, China
  • Nanyang Technological University, Energy Research Institute, Interdisciplinary Graduate School, Singapore


According to our database1, Peng Sun authored at least 44 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
UniSched: A Unified Scheduler for Deep Learning Training Jobs With Different User Demands.
IEEE Trans. Computers, June, 2024

Deep Learning Workload Scheduling in GPU Datacenters: A Survey.
ACM Comput. Surv., June, 2024

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism.
CoRR, 2024

InternLM2 Technical Report.
CoRR, 2024

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding.
CoRR, 2024

FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices.
Proceedings of the ACM on Web Conference 2024, 2024

LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

TorchGT: A Holistic System for Large-Scale Graph Transformer Training.
Proceedings of the International Conference for High Performance Computing, 2024

dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

Characterization of Large Language Model Development in the Datacenter.
Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Lins: Reducing Communication Overhead of ZeRO for Efficient LLM Training.
Proceedings of the 32nd IEEE/ACM International Symposium on Quality of Service, 2024

Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

AutoSched: An Adaptive Self-configured Framework for Scheduling Deep Learning Training Workloads.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Sylvie: 3D-Adaptive and Universal System for Large-Scale Graph Neural Network Training.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning.
CoRR, 2023

Boosting Distributed Full-graph GNN Training with Asynchronous One-bit Communication.
CoRR, 2023

Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Lucid: A Non-intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Astraea: A Fair Deep Learning Scheduler for Multi-Tenant GPU Clusters.
IEEE Trans. Parallel Distributed Syst., 2022

GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training.
IEEE Trans. Big Data, 2022

Deep Learning Workload Scheduling in GPU Datacenters: Taxonomy, Challenges and Vision.
CoRR, 2022

A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUs.
CoRR, 2022

Primo: Practical Learning-Augmented Systems with Interpretable Models.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

Titan: a scheduler for foundation model fine-tuning workloads.
Proceedings of the 13th Symposium on Cloud Computing, SoCC 2022, 2022

2021
ModelCI-e: Enabling Continual Learning in Deep Learning Serving Systems.
CoRR, 2021

Characterization and prediction of deep learning workloads in large-scale GPU datacenters.
Proceedings of the International Conference for High Performance Computing, 2021

Chronus: A Novel Deadline-aware Scheduler for Deep Learning Training Jobs.
Proceedings of the SoCC '21: ACM Symposium on Cloud Computing, 2021

2020
GraphMP: I/O-Efficient Big Graph Analytics on a Single Commodity Machine.
IEEE Trans. Big Data, 2020

Elan: Towards Generic and Efficient Elastic Training for Deep Learning.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019
Scalable Architectures for Big Data Analysis.
Proceedings of the Encyclopedia of Big Data Technologies., 2019

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes.
CoRR, 2019

2018
MetaFlow: A Scalable Metadata Lookup Service for Distributed File Systems in Data Centers.
IEEE Trans. Big Data, 2018

On Distributed Algorithms for Cost-Efficient Data Center Placement in Cloud Computing.
CoRR, 2018

Speeding-Up Age Estimation in Intelligent Demographics System via Network Optimization.
Proceedings of the 2018 IEEE International Conference on Communications, 2018

2017
Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach.
Proceedings of the 2017 IEEE International Conference on Smart Computing, 2017

GraphMP: An Efficient Semi-External-Memory Big Graph Processing System on a Single Machine.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

GraphH: High Performance Big Graph Analytics in Small Clusters.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Timed Dataflow: Reducing Communication Overhead for Distributed Machine Learning Systems.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

2014
CREATE: Correlation enhanced traffic matrix estimation in Data Center Networks.
Proceedings of the 2014 IFIP Networking Conference, Trondheim, 2014

2013
Cloud3DView: an interactive tool for cloud data center operations.
Proceedings of the ACM SIGCOMM 2013 Conference, 2013


  Loading...