Lianmin Zheng

Orcid: 0000-0002-6611-4612

According to our database1, Lianmin Zheng authored at least 34 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Post-Training Sparse Attention with Double Sparsity.
CoRR, 2024

SLoRA: Scalable Serving of Thousands of LoRA Adapters.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Efficiently Programming Large Language Models using SGLang.
CoRR, 2023

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples.
CoRR, 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters.
CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
CoRR, 2023

On Optimal Caching and Model Multiplexing for Large Model Inference.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Towards Optimal Caching and Model Selection for Large Model Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

On Optimizing the Communication of Model Parallelism.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

TensorIR: An Abstraction for Automatic Tensorized Program Optimization.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
NumS: Scalable Array Programming for the Cloud.
CoRR, 2022

GACT: Activation Compressed Training for General Architectures.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
CoRR, 2022

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

GACT: Activation Compressed Training for Generic Network Architectures.
Proceedings of the International Conference on Machine Learning, 2022

2021
TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Simple and Automatic Distributed Machine Learning on Ray.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

2020
Ansor: Generating High-Performance Tensor Programs for Deep Learning.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

2019
A Hardware-Software Blueprint for Flexible Deep Learning Specialization.
IEEE Micro, 2019

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Size-to-depth: A New Perspective for Single Image Depth Estimation.
CoRR, 2018

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning.
Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation, 2018

Learning to Optimize Tensor Programs.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence.
CoRR, 2017


  Loading...