FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.
CoRR, 2022
tcFFT: Accelerating Half-Precision FFT through Tensor Cores.
CoRR, 2021
tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores.
Proceedings of the IEEE International Conference on Cluster Computing, 2021
An Empirical Study of HPC Workloads on Huawei Kunpeng 916 Processor.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019