Zhuohan Li

Orcid: 0009-0004-1534-9106

Affiliations:
  • University of California, Berkeley, CA, USA


According to our database1, Zhuohan Li authored at least 24 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
CoRR, 2024

Fairness in Serving Large Language Models.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
What is the State of Memory Saving for Model Training?
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Optimizing the Communication of Model Parallelism.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021
Rearchitecting In-Memory Object Stores for Low Latency.
Proc. VLDB Endow., 2021

Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems.
Proceedings of the ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021., 2021

Simple and Automatic Distributed Machine Learning on Ray.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
CoRR, 2020

Hoplite: Efficient Collective Communication for Task-Based Distributed Systems.
CoRR, 2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.
CoRR, 2019

Fast Structured Decoding for Sequence Models.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Efficient Training of BERT by Progressively Stacking.
Proceedings of the 36th International Conference on Machine Learning, 2019

Hint-Based Training for Non-Autoregressive Machine Translation.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018
Towards Binary-Valued Gates for Robust LSTM Training.
Proceedings of the 35th International Conference on Machine Learning, 2018


  Loading...