2024
Exploiting Fine-Grained Redundancy in Set-Centric Graph Pattern Mining.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
2023
Accelerating k-Shape Time Series Clustering Algorithm Using GPU.
IEEE Trans. Parallel Distributed Syst., October, 2023
SI on parallel system and algorithm optimization.
CCF Trans. High Perform. Comput., September, 2023
AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3-D Parallelization and Leap-Format.
IEEE Trans. Parallel Distributed Syst., March, 2023
Adaptive Workload-Balanced Scheduling Strategy for Global Ocean Data Assimilation on Massive GPUs.
Proceedings of the International Conference for High Performance Computing, 2023
GraphPar: Efficient Workload-Aware Subgraph Matching System on Multiple GPUs.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023
2022
Fast and accurate variable batch size convolution neural network training on large scale distributed systems.
Concurr. Comput. Pract. Exp., 2022
W-Cycle SVD: A Multilevel Algorithm for Batched SVD on GPUs.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
A W-cycle algorithm for efficient batched SVD on GPUs.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
MegTaiChi: dynamic tensor-based memory management optimization for DNN training.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
2021
I/O lower bounds for auto-tuning of convolutions in CNNs.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
2020
Fast Data-Obtaining Algorithm for Data Assimilation with Large Data Set.
Int. J. Parallel Program., 2020
Communication Lower Bounds of Convolutions in CNNs.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020
2019
Trade-offs between computation, communication, and synchronization in stencil-collective alternate update.
CCF Trans. High Perform. Comput., 2019
S-EnKF: co-designing for scalable ensemble Kalman filter.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Tensor Layout Optimization of Convolution for Inference on Digital Signal Processor.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019
A Variable Batch Size Strategy for Large Scale Distributed DNN Training.
Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019
2018
Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model.
Proceedings of the 47th International Conference on Parallel Processing, 2018
AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018
2013
Multilevel correction for collocation solutions of Volterra integral equations with proportional delays.
Adv. Comput. Math., 2013