We stand with Ukraine

We stand with Ukraine

Zhuohan Li

Orcid: 0009-0004-1534-9106

Affiliations:

University of California, Berkeley, CA, USA

According to our database¹, Zhuohan Li authored at least 24 papers between 2018 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2018

2019

2020

2021

2022

2023

2024

0

1

2

3

4

5

6

7

8

1

2

1

1

2

1

2

5

1

3

1

3

1

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2024

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2024

Fairness in Serving Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Joseph E. Gonzalez

,

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

What is the State of Memory Saving for Model Training?

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Clark W. Barrett

,

Joseph E. Gonzalez

,

,

Christopher Ré

,

,

CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Joseph Gonzalez

,

,

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Optimizing the Communication of Model Parallelism.

[BibT_eX]

[DOI]

,

,

,

,

,

Joseph Gonzalez

,

,

,

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Christopher Ré

,

,

Proceedings of the International Conference on Machine Learning, 2023

2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Joseph E. Gonzalez

,

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021

Rearchitecting In-Memory Object Stores for Low Latency.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proc. VLDB Endow., 2021

Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems.

[BibT_eX]

[DOI]

,

,

,

,

,

Robert Nishihara

,

,

Proceedings of the ACM SIGCOMM 2021 Conference, Virtual Event, USA, August 23-27, 2021., 2021

Simple and Automatic Distributed Machine Learning on Ray.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

2020

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Joseph E. Gonzalez

CoRR, 2020

Hoplite: Efficient Collective Communication for Task-Based Distributed Systems.

[BibT_eX]

[DOI]

,

,

,

,

,

Robert Nishihara

,

,

CoRR, 2020

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

2019

Understanding and Improving Transformer From a Multi-Particle Dynamic System Point of View.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2019

Fast Structured Decoding for Sequence Models.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Efficient Training of BERT by Progressively Stacking.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 36th International Conference on Machine Learning, 2019

Hint-Based Training for Non-Autoregressive Machine Translation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

2018

Towards Binary-Valued Gates for Robust LSTM Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

Loading...