Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data.
CoRR, 2024
Aquila2 Technical Report.
CoRR, 2024
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies.
CoRR, 2024
Adaptive SpMV/SpMSpV on GPUs for Input Vectors of Varied Sparsity.
IEEE Trans. Parallel Distributed Syst., 2021
End-to-end Adaptive Distributed Training on PaddlePaddle.
CoRR, 2021
AutoWM: a novel domain-specific tool for universal multi-/many-core accelerations of the WRF cloud microphysics.
Clust. Comput., 2021
Solving a trillion unknowns per second with HPGMG on Sunway TaihuLight.
Clust. Comput., 2020
Enabling Highly Efficient k-Means Computations on the SW26010 Many-Core Processor of Sunway TaihuLight.
J. Comput. Sci. Technol., 2019
Extreme-Scale High-Order WENO Simulations of 3-D Detonation Wave with 10 Million Cores.
ACM Trans. Archit. Code Optim., 2018
Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer.
ACM Trans. Archit. Code Optim., 2018
A Fast Sparse Triangular Solver for Structured-grid Problems on Sunway Many-core Processor SW26010.
Proceedings of the 47th International Conference on Parallel Processing, 2018
Extreme-Scale Realistic Stencil Computations on Sunway TaihuLight with Ten Million Cores.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018
26 PFLOPS Stencil Computations for Atmospheric Modeling on Sunway TaihuLight.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Towards Highly Efficient DGEMM on the Emerging SW26010 Many-Core Processor.
Proceedings of the 46th International Conference on Parallel Processing, 2017
10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics.
Proceedings of the International Conference for High Performance Computing, 2016
Pattern-Driven Hybrid Multi- and Many-Core Acceleration in the MPAS Shallow-Water Model.
Proceedings of the 44th International Conference on Parallel Processing, 2015
Performance Evaluation of HPGMG on Tianhe-2: Early Experience.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015