Elias Frantar

Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

Error Feedback Can Accurately Compress Preconditioners.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SPADE: Sparsity-Guided Debugging for Deep Neural Networks.

[BibT_eX]

[DOI]

Arshia Soltani Moakhar

Eugenia Iofinova

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Extreme Compression of Large Language Models via Additive Quantization.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Scaling Laws for Sparsely-Connected Foundation Models.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023

QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models.

[BibT_eX]

[DOI]

CoRR, 2023

Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Sparse Fine-tuning for Inference Acceleration of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

JaxPruner: A concise library for sparsity research.

[BibT_eX]

[DOI]

CoRR, 2023

Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression.

[BibT_eX]

[DOI]

CoRR, 2023

ZipLM: Hardware-Aware Structured Pruning of Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

ZipLM: Inference-Aware Structured Pruning of Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot.

[BibT_eX]

[DOI]

Mohammadreza Alimohammadi

Proceedings of the International Conference on Machine Learning, 2023

OPTQ: Accurate Quantization for Generative Pre-trained Transformers.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

L-GreCo: An Efficient and General Framework for Layerwise-Adaptive Gradient Compression.

[BibT_eX]

[DOI]

Ilia Markov

CoRR, 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

oViT: An Accurate Second-Order Pruning Framework for Vision Transformers.

[BibT_eX]

[DOI]

CoRR, 2022

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

SPDY: Accurate Pruning with Speedup Guarantees.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization.

[BibT_eX]

[DOI]

CoRR, 2021

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.

[BibT_eX]

[DOI]