Conglong Li

According to our database¹, Conglong Li authored at least 24 papers between 2013 and 2024.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.

[BibT_eX]

[DOI]

Cindy Orozco Bohorquez

Massimiliano Lupo Pasini

CoRR, 2023

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.

[BibT_eX]

[DOI]

CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.

[BibT_eX]

[DOI]

Zhewei Yao

Reza Yazdani Aminabadi

CoRR, 2023

Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.

[BibT_eX]

[DOI]

CoRR, 2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

CoRR, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.

[BibT_eX]

[DOI]

Zhewei Yao

Reza Yazdani Aminabadi

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.

[BibT_eX]

[DOI]

Conglong Li

Minjia Zhang

Yuxiong He

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.

[BibT_eX]

[DOI]

Reza Yazdani Aminabadi

Ammar Ahmad Awan

Jeff Rasley

Yuxiong He

Proceedings of the International Conference on Machine Learning, 2022

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

2021

Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training.

[BibT_eX]

[DOI]

Conglong Li

Minjia Zhang

Yuxiong He

CoRR, 2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Improving Approximate Nearest Neighbor Search through Learned Adaptive Early Termination.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Conference on Management of Data, 2020

2019

Scaling Video Analytics on Constrained Edge Nodes.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

2018

Better Caching in Search Advertising Systems with Rapid Refresh Predictions.

[BibT_eX]

[DOI]

Proceedings of the 2018 World Wide Web Conference on World Wide Web, 2018

2017

Workload analysis and caching strategies for search advertising systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 Symposium on Cloud Computing, SoCC 2017, Santa Clara, CA, USA, 2017

Using Indirect Routing to Recover from Network Traffic Scheduling Estimation Error.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2017

2015

GD-Wheel: a cost-aware replacement policy for key-value stores.

[BibT_eX]

[DOI]

Conglong Li

Alan L. Cox

Proceedings of the Tenth European Conference on Computer Systems, 2015

Scheduling techniques for hybrid circuit/packet networks.

[BibT_eX]

[DOI]

Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies, 2015

2013

Reducing DRAM row activations with eager read/write clustering.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Conglong Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...