Shaohuai Shi
Orcid: 0000-0002-1418-5160
According to our database1,
Shaohuai Shi
authored at least 64 papers
between 2010 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems.
CoRR, 2024
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference.
CoRR, 2024
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression.
CoRR, 2024
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing.
Proceedings of the 32nd IEEE/ACM International Symposium on Quality of Service, 2024
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules.
Proceedings of the IEEE INFOCOM 2024, 2024
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning.
Proceedings of the 53rd International Conference on Parallel Processing, 2024
Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning.
Proceedings of the 53rd International Conference on Parallel Processing, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024
Performance Analysis and Optimizations of Matrix Multiplications on ARMv8 Processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024
2023
GossipFL: A Decentralized Federated Learning Framework With Sparsified and Adaptive Communication.
IEEE Trans. Parallel Distributed Syst., March, 2023
IEEE Trans. Cloud Comput., 2023
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models.
CoRR, 2023
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining.
CoRR, 2023
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs.
CoRR, 2023
CoRR, 2023
CoRR, 2023
A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games.
CoRR, 2023
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training.
CoRR, 2023
CoRR, 2023
Accelerating Distributed K-FAC with Efficient Collective Communication and Scheduling.
Proceedings of the IEEE INFOCOM 2023, 2023
Proceedings of the IEEE INFOCOM 2023, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining.
Proceedings of the 43rd IEEE International Conference on Distributed Computing Systems, 2023
2022
CoRR, 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters.
CoRR, 2022
Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2021
IEEE Netw., 2021
CoRR, 2021
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans.
CoRR, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Exploiting Simultaneous Communications to Accelerate Data Parallel Distributed Deep Learning.
Proceedings of the 40th IEEE Conference on Computer Communications, 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks.
Proceedings of the 41st IEEE International Conference on Distributed Computing Systems, 2021
Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges.
CoRR, 2020
CoRR, 2020
CoRR, 2020
Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs.
Proceedings of the 39th IEEE Conference on Computer Communications, 2020
Proceedings of the 2020 IEEE International Conference on Robotics and Automation, 2020
Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format.
Proceedings of the 26th IEEE International Conference on Parallel and Distributed Systems, 2020
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.
Proceedings of the ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020, 2020
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
Proceedings of the 2019 IEEE Conference on Computer Communications, 2019
A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks.
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019
Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019
2018
CoRR, 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes.
CoRR, 2018
Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs.
CoRR, 2018
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018
Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, 2018
2017
CoRR, 2017
Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose.
CoRR, 2017
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units.
CoRR, 2017
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017
Proceedings of the 3rd International Conference on Big Data Computing and Communications, 2017
2016
Proceedings of the 7th International Conference on Cloud Computing and Big Data, 2016
2011
Proceedings of the 14th IEEE International Conference on Computational Science and Engineering, 2011
2010
Proceedings of the 10th IEEE International Conference on Computer and Information Technology, 2010