Olatunji Ruwase
Orcid: 0000-0002-5508-0728
According to our database1,
Olatunji Ruwase
authored at least 42 papers
between 2004 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping.
CoRR, 2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer.
CoRR, 2024
Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training.
CoRR, 2024
CoRR, 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
CoRR, 2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules.
Proceedings of the International Conference for High Performance Computing, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
2023
ACM Trans. Embed. Comput. Syst., March, 2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
CoRR, 2023
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
2022
DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
2021
Proceedings of the 2021 USENIX Annual Technical Conference, 2021
Proceedings of the International Conference for High Performance Computing, 2021
SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
2020
Proceedings of the International Conference for High Performance Computing, 2020
DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
2019
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019
2018
IEEE Trans. Netw. Serv. Manag., 2018
2017
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
2016
SERF: efficient scheduling for fast deep neural network serving via judicious parallelism.
Proceedings of the International Conference for High Performance Computing, 2016
2015
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015
Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Toward accelerating deep learning at scale using specialized hardware in the datacenter.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015
2014
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
2013
PhD thesis, 2013
2010
Decoupled lifeguards: enabling path optimizations for dynamic correctness checking tools.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010
2009
2008
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008
Proceedings of the 14th Annual International Conference on Mobile Computing and Networking, 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
2004
Proceedings of the Network and Distributed System Security Symposium, 2004