Zhewei Yao
Orcid: 0000-0001-7678-4321
According to our database1,
Zhewei Yao
authored at least 70 papers
between 2017 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation.
CoRR, 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
CoRR, 2024
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024
Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023
ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers.
CoRR, 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023
CoRR, 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
CoRR, 2023
Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
CoRR, 2022
Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022
CoRR, 2022
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the Tenth International Conference on Learning Representations, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
2021
CoRR, 2021
CoRR, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 38th International Conference on Machine Learning, 2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Improving Semi-supervised Federated Learning by Reducing the Gradient Diversity of Models.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, 2020
Proceedings of the 37th International Conference on Machine Learning, 2020
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
改进型循环生成对抗网络的血管内超声图像增强 (Improved CycleGANs for Intravascular Ultrasound Image Enhancement).
计算机科学, 2019
CoRR, 2019
Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data.
CoRR, 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent.
CoRR, 2018
Large batch size training of neural networks with adversarial training and second-order information.
CoRR, 2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
2017
SIAM/ASA J. Uncertain. Quantification, 2017
On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference.
J. Comput. Phys., 2017
Nonlocal total variation based on symmetric Kullback-Leibler divergence for the ultrasound image despeckling.
BMC Medical Imaging, 2017