Beidi Chen

According to our database1, Beidi Chen authored at least 68 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
CoRR, 2024

MagicPIG: LSH Sampling for Efficient LLM Generation.
CoRR, 2024

Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild.
CoRR, 2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs.
CoRR, 2024

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding.
CoRR, 2024

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training.
CoRR, 2024

VcLLM: Video Codecs are Secretly Tensor Codecs.
CoRR, 2024

It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF.
CoRR, 2024

Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity.
CoRR, 2024

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices.
CoRR, 2024

Nearest Neighbor Speculative Decoding for LLM Generation and Attribution.
CoRR, 2024

Memory Mosaics.
CoRR, 2024

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding.
CoRR, 2024

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length.
CoRR, 2024

Prompt-prompted Mixture of Experts for Efficient LLM Generation.
CoRR, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
CoRR, 2024

LLM Inference Unveiled: Survey and Roofline Model Insights.
CoRR, 2024

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding.
CoRR, 2024

Learn To be Efficient: Build Structured Sparsity in Large Language Models.
CoRR, 2024

Q-Hitter: A Better Token Oracle for Efficient LLM Inference via Sparse-Quantized KV Cache.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Soft Prompt Recovers Compressed LLMs, Transferably.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LoCoCo: Dropping In Convolutions for Long Context Compression.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Efficient Streaming Language Models with Attention Sinks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
HexGen: Generative Inference of Foundation Model over Heterogeneous Decentralized Environment.
CoRR, 2023

H<sub>2</sub>O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
CoRR, 2023

InRank: Incremental Low-Rank Learning.
CoRR, 2023

Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt.
CoRR, 2023

High-throughput Generative Inference of Large Language Models with a Single GPU.
CoRR, 2023

Modeling Scattering Coefficients using Self-Attentive Complex Polynomials with Image-based Representation.
CoRR, 2023

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks.
Proceedings of the International Conference on Machine Learning, 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time.
Proceedings of the International Conference on Machine Learning, 2023

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU.
Proceedings of the International Conference on Machine Learning, 2023

Fast Algorithms for a New Relaxation of Optimal Transport.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees.
CoRR, 2022

Decentralized Training of Foundation Models in Heterogeneous Environments.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

HALOS: Hashing Large Output Space for Cheap Inference.
Proceedings of the Fifth Conference on Machine Learning and Systems, 2022

Monarch: Expressive Structured Matrices for Efficient and Accurate Training.
Proceedings of the International Conference on Machine Learning, 2022

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Satellite Images and Deep Learning to Identify Discrepancy in Mailing Addresses with Applications to Census 2020 in Houston.
CoRR, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention Approximation.
CoRR, 2021

Locality Sensitive Teaching.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Scatterbrain: Unifying Sparse and Low-rank Attention.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Tale of Two Efficient and Informative Negative Sampling Distributions.
Proceedings of the 38th International Conference on Machine Learning, 2021

SOLAR: Sparse Orthogonal Learned and Random Embeddings.
Proceedings of the 9th International Conference on Learning Representations, 2021

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
A Constant-time Adaptive Negative Sampling.
CoRR, 2020

Climbing the WOL: Training for Cheaper Inference.
CoRR, 2020

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.
Proceedings of the Third Conference on Machine Learning and Systems, 2020

Angular Visual Hardness.
Proceedings of the 37th International Conference on Machine Learning, 2020

2019
Sub-Linear Privacy-Preserving Near-Neighbor Search.
IACR Cryptol. ePrint Arch., 2019

Lsh-sampling Breaks the Computation Chicken-and-egg Loop in Adaptive Stochastic Gradient Estimation.
CoRR, 2019

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems.
CoRR, 2019

Fast and Accurate Stochastic Gradient Estimation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Densified Winner Take All (WTA) Hashing for Sparse Datasets.
Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, 2018

Lsh-Sampling breaks the Computational chicken-and-egg Loop in adaptive stochastic Gradient estimation.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Unique Entity Estimation with Application to the Syrian Conflict.
CoRR, 2017

2016
Sub-linear Privacy-preserving Search with Untrusted Server and Semi-honest Parties.
CoRR, 2016

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets.
CoRR, 2016


  Loading...