Samyam Rajbhandari
Orcid: 0000-0002-0386-8759
According to our database1,
Samyam Rajbhandari
authored at least 39 papers
between 2012 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation.
CoRR, 2024
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference.
CoRR, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024
System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
2023
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
CoRR, 2023
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model.
CoRR, 2022
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022
1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Proceedings of the International Conference for High Performance Computing, 2021
SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed.
Proceedings of the 38th International Conference on Machine Learning, 2021
2020
Knowl. Inf. Syst., 2020
CoRR, 2020
Proceedings of the International Conference for High Performance Computing, 2020
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
2019
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019
2018
Proceedings of the 2018 USENIX Annual Technical Conference, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
2017
Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment.
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 25th International Conference on Compiler Construction, 2016
2014
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
2013
A framework for load balancing of tensor contraction expressions via dynamic task partitioning.
Proceedings of the International Conference for High Performance Computing, 2013
2012
Proceedings of the International Conference on Computational Science, 2012