Hao Zhang

Orcid: 0009-0003-8392-3977

Affiliations:
  • University of California, San Diego, CA, USA
  • University of California, Berkeley, CA, USA


According to our database1, Hao Zhang authored at least 17 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput.
CoRR, 2024

Toward Inference-optimal Mixture-of-Expert Large Language Models.
CoRR, 2024

MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving.
CoRR, 2024

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
CoRR, 2024

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding.
CoRR, 2024

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023
Online Speculative Decoding.
CoRR, 2023

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers.
CoRR, 2023

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Optimizing the Communication of Model Parallelism.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

2022
Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021
Simple and Automatic Distributed Machine Learning on Ray.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models.
Proceedings of the 38th International Conference on Machine Learning, 2021


  Loading...