Shenggan Cheng

Orcid: 0000-0002-7966-2941

According to our database1, Shenggan Cheng authored at least 16 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem.
CoRR, 2024

DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers.
CoRR, 2024

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.
CoRR, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference.
CoRR, 2024

Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model Inference.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

HeteGen: Efficient Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices.
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, 2024

AutoChunk: Automated Activation Chunk for Memory-Efficient Deep Learning Inference.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
ATP: Adaptive Tensor Parallelism for Foundation Models.
CoRR, 2023

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency.
Proceedings of the International Conference for High Performance Computing, 2023

2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.
CoRR, 2022

2021
tcFFT: Accelerating Half-Precision FFT through Tensor Cores.
CoRR, 2021

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
HMS-Net: Hierarchical Multi-Scale Sparsity-Invariant Network for Sparse Depth Completion.
IEEE Trans. Image Process., 2020

FTL: A Universal Framework for Training Low-Bit DNNs via Feature Transfer.
Proceedings of the Computer Vision - ECCV 2020, 2020

CUBE - Towards an Optimal Scaling of Cosmological N-body Simulations.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020


  Loading...