2024

Exploiting Fine-Grained Redundancy in Set-Centric Graph Pattern Mining.

[DOI]

Zhiheng Lin

Ke Meng

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

A Coordinated Strategy for GNN Combining Computational Graph and Operator Optimizations.

[DOI]

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

2023

Accelerating k-Shape Time Series Clustering Algorithm Using GPU.

[DOI]

IEEE Trans. Parallel Distributed Syst., October, 2023

SI on parallel system and algorithm optimization.

[DOI]

Liang Yuan

Junmin Xiao

CCF Trans. High Perform. Comput., September, 2023

AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3-D Parallelization and Leap-Format.

[DOI]

IEEE Trans. Parallel Distributed Syst., March, 2023

Adaptive Workload-Balanced Scheduling Strategy for Global Ocean Data Assimilation on Massive GPUs.

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

GraphPar: Efficient Workload-Aware Subgraph Matching System on Multiple GPUs.

[DOI]

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

2022

Fast and accurate variable batch size convolution neural network training on large scale distributed systems.

[DOI]

Concurr. Comput. Pract. Exp., 2022

W-Cycle SVD: A Multilevel Algorithm for Batched SVD on GPUs.

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

A W-cycle algorithm for efficient batched SVD on GPUs.

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

MegTaiChi: dynamic tensor-based memory management optimization for DNN training.

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

2021

I/O lower bounds for auto-tuning of convolutions in CNNs.

[DOI]

Xiaoyang Zhang

Junmin Xiao

Guangming Tan

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

2020

Fast Data-Obtaining Algorithm for Data Assimilation with Large Data Set.

[DOI]

Int. J. Parallel Program., 2020

Communication Lower Bounds of Convolutions in CNNs.

[DOI]

Xiaoyang Zhang

Junmin Xiao

Guangming Tan

Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

2019

Trade-offs between computation, communication, and synchronization in stencil-collective alternate update.

[DOI]

Junmin Xiao

Jian Peng

CCF Trans. High Perform. Comput., 2019

S-EnKF: co-designing for scalable ensemble Kalman filter.

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Tensor Layout Optimization of Convolution for Inference on Digital Signal Processor.

[DOI]

Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

A Variable Batch Size Strategy for Large Scale Distributed DNN Training.

[DOI]

Proceedings of the 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2019

2018

Communication-Avoiding for Dynamical Core of Atmospheric General Circulation Model.

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

AGCM3D: A Highly Scalable Finite-Difference Dynamical Core of Atmospheric General Circulation Model Based on 3D Decomposition.

[DOI]

Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

2013

Multilevel correction for collocation solutions of Volterra integral equations with proportional delays.

[DOI]

Junmin Xiao

Qiya Hu

Adv. Comput. Math., 2013