Zhewei Yao

Orcid: 0000-0001-7678-4321

According to our database1, Zhewei Yao authored at least 70 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
AI and Memory Wall.
IEEE Micro, 2024

SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation.
CoRR, 2024

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning.
CoRR, 2024

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding.
CoRR, 2024

FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design.
CoRR, 2024

Quant-LLM: Accelerating the Serving of Large Language Models via FP6-Centric Algorithm-System Co-Design on Modern GPUs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

ZeRO++: Extremely Efficient Collective Communication for Large Model Training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks.
CoRR, 2023

ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers.
CoRR, 2023

DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention.
CoRR, 2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model.
CoRR, 2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales.
CoRR, 2023

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats.
CoRR, 2023

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?
CoRR, 2023

A Comprehensive Study on Post-Training Quantization for Large Language Models.
CoRR, 2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases.
CoRR, 2023

Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases.
Proceedings of the International Conference on Machine Learning, 2023

DySR: Adaptive Super-Resolution via Algorithm and System Co-design.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Scaling Vision-Language Models with Sparse Mixture of Experts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing.
CoRR, 2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers.
CoRR, 2022

BiFeat: Supercharge GNN Training via Graph Feature Quantization.
CoRR, 2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
CoRR, 2022

Hessian-Aware Pruning and Optimal Neural Implant.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

XTC: Extreme Compression for Pre-trained Transformers Made Simple and Efficient.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale.
Proceedings of the International Conference on Machine Learning, 2022

How Much Can CLIP Benefit Vision-and-Language Tasks?
Proceedings of the Tenth International Conference on Learning Representations, 2022

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Inexact Nonconvex Newton-Type Methods.
INFORMS J. Optim., January, 2021

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models.
CoRR, 2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition.
CoRR, 2021

A Survey of Quantization Methods for Efficient Neural Network Inference.
CoRR, 2021

Hessian-Aware Pruning and Optimal Neural Implant.
CoRR, 2021

HAWQ-V3: Dyadic Neural Network Quantization.
Proceedings of the 38th International Conference on Machine Learning, 2021

I-BERT: Integer-only BERT Quantization.
Proceedings of the 38th International Conference on Machine Learning, 2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training.
Proceedings of the 38th International Conference on Machine Learning, 2021

What's Hidden in a One-layer Randomly Weighted Transformer?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Improving Semi-supervised Federated Learning by Reducing the Gradient Diversity of Models.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
HAWQV3: Dyadic Neural Network Quantization.
CoRR, 2020

Benchmarking Semi-supervised Federated Learning.
CoRR, 2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning.
CoRR, 2020

Rethinking Batch Normalization in Transformers.
CoRR, 2020

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks.
Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, 2020

PowerNorm: Rethinking Batch Normalization in Transformers.
Proceedings of the 37th International Conference on Machine Learning, 2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

ZeroQ: A Novel Zero Shot Quantization Framework.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

PyHessian: Neural Networks Through the Lens of the Hessian.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Inefficiency of K-FAC for Large Batch Size Training.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
改进型循环生成对抗网络的血管内超声图像增强 (Improved CycleGANs for Intravascular Ultrasound Image Enhancement).
计算机科学, 2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks.
CoRR, 2019

ANODEV2: A Coupled Neural ODE Evolution Framework.
CoRR, 2019

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization.
CoRR, 2019

Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data.
CoRR, 2019

ANODEV2: A Coupled Neural ODE Framework.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Trust Region Based Adversarial Attack on Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Parameter Re-Initialization through Cyclical Batch Size Schedules.
CoRR, 2018

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent.
CoRR, 2018

Large batch size training of neural networks with adversarial training and second-order information.
CoRR, 2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
A Hybrid Adaptive MCMC Algorithm in Function Spaces.
SIAM/ASA J. Uncertain. Quantification, 2017

On an adaptive preconditioned Crank-Nicolson MCMC algorithm for infinite dimensional Bayesian inference.
J. Comput. Phys., 2017

Nonlocal total variation based on symmetric Kullback-Leibler divergence for the ultrasound image despeckling.
BMC Medical Imaging, 2017


  Loading...