2025
Optimizing Nuclear Configuration Interaction Calculations on GPUs: A Comparative Performance Study of Programming Models.
Proceedings of the ISC High Performance 2025 Research Paper Proceedings (40th International Conference), 2025
Maximizing Power-Constrained Supercomputing Throughput.
Proceedings of the ISC High Performance 2025 Research Paper Proceedings (40th International Conference), 2025
2024
Evaluating the potential of disaggregated memory systems for HPC applications.
,
,
,
,
,
,
,
,
,
,
Concurr. Comput. Pract. Exp., August, 2024
Performance Modeling and Analysis of a de Bruijn Graph Based Local Assembly Kernel on Multiple Vendor GPUs.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
A Workflow Roofline Model for End-to-End Workflow Performance Analysis.
Proceedings of the International Conference for High Performance Computing, 2024
2023
Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters.
Proceedings of the International Conference for High Performance Computing, 2023
Evaluating the Performance of One-sided Communication on CPUs and GPUs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
2022
Instruction Roofline: An insightful visual performance model for GPUs.
Concurr. Comput. Pract. Exp., 2022
A Methodology for Evaluating Tightly-integrated and Disaggregated Accelerated Architectures.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022
2021
Accelerating large scale <i>de novo</i> metagenome assembly using GPUs.
Proceedings of the International Conference for High Performance Computing, 2021
Evaluating Performance and Portability of a core bioinformatics kernel on multiple vendor GPUs.
Proceedings of the International Workshop on Performance, 2021
A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver.
Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021
2020
APMT: an automatic hardware counter-based performance modeling tool for HPC applications.
CCF Trans. High Perform. Comput., 2020
Leveraging One-Sided Communication for Sparse Triangular Solvers.
Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020
LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
GPU accelerated partial order multiple sequence alignment for long reads self-correction.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
2019
An automatic performance model-based scheduling tool for coupled climate system models.
J. Parallel Distributed Comput., 2019
An Instruction Roofline Model for GPUs.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019
2017
Redesigning CAM-SE for peta-scale climate modeling performance and ultra-high resolution on Sunway TaihuLight.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2017
2016
Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2016
2014
CESMTuner: An Auto-tuning Framework for the Community Earth System Model.
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014