Chengming Zhang

Orcid: 0000-0003-3008-9133

Affiliations:
  • Washington State University, Pullman, WA, USA
  • University of Alabama, Al, USA


According to our database1, Chengming Zhang authored at least 20 papers between 2020 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Sequence Length Scaling in Vision Transformers for Scientific Images on Frontier.
CoRR, 2024

System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
Proceedings of the 43rd ACM Symposium on Principles of Distributed Computing, 2024

2023
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models.
CoRR, 2023

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates.
CoRR, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.
Proc. VLDB Endow., 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI.
Proceedings of the 22nd International Symposium on Quality Electronic Design, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
CoRR, 2020

waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

CurvaNet: Geometric Deep Learning based on Directional Curvature for 3D Shape Analysis.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020


  Loading...