Ying Sheng

Orcid: 0000-0002-1883-2126

Affiliations:
  • Stanford University, CA, USA


According to our database1, Ying Sheng authored at least 29 papers between 2020 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Inference-Friendly Models With MixAttention.
CoRR, 2024

Post-Training Sparse Attention with Double Sparsity.
CoRR, 2024

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors.
CoRR, 2024

DafnyBench: A Benchmark for Formal Software Verification.
CoRR, 2024

Clover: Closed-Loop Verifiable Code Generation.
Proceedings of the AI Verification - First International Symposium, 2024

Fairness in Serving Large Language Models.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

SLoRA: Scalable Serving of Thousands of LoRA Adapters.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Combining Stable Infiniteness and (Strong) Politeness.
J. Autom. Reason., December, 2023

Reasoning About Vectors: Satisfiability Modulo a Theory of Sequences.
J. Autom. Reason., September, 2023

Efficiently Programming Large Language Models using SGLang.
CoRR, 2023

S-LoRA: Serving Thousands of Concurrent LoRA Adapters.
CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
CoRR, 2023

On Optimal Caching and Model Multiplexing for Large Model Inference.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Efficient Memory Management for Large Language Model Serving with PagedAttention.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Towards Optimal Caching and Model Selection for Large Model Inference.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

2022
Read-once refutations in Horn constraint systems: an algorithmic approach.
J. Log. Comput., 2022

Polite Combination of Algebraic Datatypes.
J. Autom. Reason., 2022

cvc5: A Versatile and Industrial-Strength SMT Solver.
Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems, 2022

Reasoning About Vectors Using an SMT Theory of Sequences.
Proceedings of the Automated Reasoning - 11th International Joint Conference, 2022

2021
Politeness for the Theory of Algebraic Datatypes (Extended Abstract).
Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, 2021

Politeness and Stable Infiniteness: Stronger Together.
Proceedings of the Automated Deduction - CADE 28, 2021

2020
Politeness for the Theory of Algebraic Datatypes.
Proceedings of the Automated Reasoning - 10th International Joint Conference, 2020


  Loading...