2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours.
CoRR, 2022

2021
tcFFT: Accelerating Half-Precision FFT through Tensor Cores.
CoRR, 2021

tcFFT: A Fast Half-Precision FFT Library for NVIDIA Tensor Cores.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2019
An Empirical Study of HPC Workloads on Huawei Kunpeng 916 Processor.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019