Chengming Zhang

Orcid: 0000-0003-3008-9133

Affiliations:

Washington State University, Pullman, WA, USA
University of Alabama, Al, USA

According to our database¹, Chengming Zhang authored at least 25 papers between 2020 and 2025.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2025

High-performance Visual Semantics Compression for AI-Driven Science.

[BibT_eX]

[DOI]

Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025

2024

GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors.

[BibT_eX]

[DOI]

CoRR, 2024

AdaCM<sup>2</sup>: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier.

[BibT_eX]

[DOI]

CoRR, 2024

System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.

[BibT_eX]

[DOI]

Reza Yazdani Aminadabi

Shuaiwen Leon Song

Samyam Rajbhandari

Yuxiong He

Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.

[BibT_eX]

[DOI]

Reza Yazdani Aminabadi

Shuaiwen Leon Song

Samyam Rajbhandari

Yuxiong He

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.

[BibT_eX]

[DOI]

Cindy Orozco Bohorquez

Massimiliano Lupo Pasini

CoRR, 2023

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.

[BibT_eX]

[DOI]

CoRR, 2023

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.

[BibT_eX]

[DOI]

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.

[BibT_eX]

[DOI]

Lizhi Xiang

Miao Yin

Chengming Zhang

Aravind Sukumaran-Rajam

P. Sadayappan

Bo Yuan

Dingwen Tao

Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Supercomputing, 2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates.

[BibT_eX]

[DOI]

CoRR, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021

COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.

[BibT_eX]

[DOI]

CoRR, 2021

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.

[BibT_eX]

[DOI]

CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

CurvaNet: Geometric Deep Learning based on Directional Curvature for 3D Shape Analysis.

[BibT_eX]

[DOI]

Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Chengming Zhang

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...