Olatunji Ruwase

Orcid: 0000-0002-5508-0728

According to our database1, Olatunji Ruwase authored at least 42 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping.
CoRR, 2024

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer.
CoRR, 2024

Universal Checkpointing: Efficient and Flexible Checkpointing for Large Scale Distributed Training.
CoRR, 2024

FastPersist: Accelerating Model Checkpointing in Deep Learning.
CoRR, 2024

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.
CoRR, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
CoRR, 2024

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024

Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules.
Proceedings of the International Conference for High Performance Computing, 2024

ZeRO++: Extremely Efficient Collective Communication for Large Model Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
SHARP: An Adaptable, Energy-Efficient Accelerator for Recurrent Neural Networks.
ACM Trans. Embed. Comput. Syst., March, 2023

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training.
CoRR, 2023

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model.
CoRR, 2022

DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

2021
ZeRO-Offload: Democratizing Billion-Scale Model Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning.
Proceedings of the International Conference for High Performance Computing, 2021

SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training using Gradient Similarity Measurement.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
ZeRO: memory optimizations toward training trillion parameter models.
Proceedings of the International Conference for High Performance Computing, 2020

DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
LSTM-Sharp: An Adaptable, Energy-Efficient Hardware Accelerator for Long Short-Term Memory.
CoRR, 2019

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models.
CoRR, 2019

Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft.
Proceedings of the 2019 USENIX Conference on Operational Machine Learning, 2019

2018
Efficient Deep Neural Network Serving: Fast and Furious.
IEEE Trans. Netw. Serv. Manag., 2018

2017
HyperDrive: exploring hyperparameters with POP scheduling.
Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA, December 11, 2017

Optimizing CNNs on Multicores for Scalability, Performance and Goodput.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
SERF: efficient scheduling for fast deep neural network serving via judicious parallelism.
Proceedings of the International Conference for High Performance Computing, 2016

2015
Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015

Page overlays: an enhanced virtual memory framework to enable fine-grained memory management.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Toward accelerating deep learning at scale using specialized hardware in the datacenter.
Proceedings of the 2015 IEEE Hot Chips 27 Symposium (HCS), 2015

2014
Guardrail: a high fidelity approach to protecting hardware devices from buggy drivers.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Improving Device Driver Reliability through Decoupled Dynamic Binary Analyses.
PhD thesis, 2013

2010
Decoupled lifeguards: enabling path optimizations for dynamic correctness checking tools.
Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2010

2009
Flexible Hardware Acceleration for Instruction-Grain Lifeguards.
IEEE Micro, 2009

2008
Parallelizing dynamic information flow tracking.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Ditto: a system for opportunistic caching in multi-hop wireless networks.
Proceedings of the 14th Annual International Conference on Mobile Computing and Networking, 2008

Flexible Hardware Acceleration for Instruction-Grain Program Monitoring.
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

2004
A Practical Dynamic Buffer Overflow Detector.
Proceedings of the Network and Distributed System Security Symposium, 2004


  Loading...