2025
29-Billion Atoms Molecular Dynamics Simulation With Ab Initio Accuracy on 35 Million Cores of New Sunway Supercomputer.
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Computers, May, 2025
An interpretable DeePMD-kit performance model for emerging supercomputers.
CCF Trans. High Perform. Comput., April, 2025
Efficient Long Context Fine-tuning with Chunk Flow.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, March, 2025
Large Scale Finite-Temperature Real-time Time Dependent Density Functional Theory Calculation with Hybrid Functional on ARM and GPU Systems.
CoRR, January, 2025
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism.
Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2025
2024
10-Million Atoms Simulation of First-Principle Package LS3DF.
J. Comput. Sci. Technol., March, 2024
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs.
CoRR, 2024
Mille-feuille: A Tile-Grained Mixed Precision Single-Kernel Conjugate Gradient Solver on GPUs.
Proceedings of the International Conference for High Performance Computing, 2024
Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day.
Proceedings of the International Conference for High Performance Computing, 2024
Training one DeePMD Model in Minutes: a Step towards Online Learning.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table Design.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
2023
Enhance the Strong Scaling of LAMMPS on Fugaku.
Proceedings of the International Conference for High Performance Computing, 2023
RLEKF: An Optimizer for Deep Potential with Ab Initio Accuracy.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Extending the limit of molecular dynamics with <i>ab initio</i> accuracy to 10 billion atoms.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
2021
Deep Density: Circumventing the Kohn-Sham equations via symmetry preserving neural networks.
J. Comput. Phys., 2021
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with <i>ab initio</i> accuracy.
Comput. Phys. Commun., 2021
Editorial for the special issue on large-scale AI in classical HPC environment and AI for science.
CCF Trans. High Perform. Comput., 2021
2020
ELSI - An open infrastructure for electronic structure solvers.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Comput. Phys. Commun., 2020
86 PFLOPS Deep Potential Molecular Dynamics simulation of 100 million atoms with ab initio accuracy.
CoRR, 2020
Pushing the limit of molecular dynamics with <i>ab initio</i> accuracy to 100 million atoms with machine learning.
Proceedings of the International Conference for High Performance Computing, 2020
2019
Fast real-time time-dependent hybrid functional calculations with the parallel transport gauge and the adaptively compressed exchange formulation.
Comput. Phys. Commun., 2019
Parallel transport time-dependent density functional theory calculations with hybrid functional on summit.
Proceedings of the International Conference for High Performance Computing, 2019
2018
ELSI: A unified software interface for Kohn-Sham electronic structure solvers.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Comput. Phys. Commun., 2018
A Left-Looking Selected Inversion Algorithm and Task Parallelism on Shared Memory Systems.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018
2017
GPU implementation of the linear scaling three dimensional fragment method for large scale electronic structure calculations.
Comput. Phys. Commun., 2017
SGO: A fast engine for ab initio atomic structure global optimization by differential evolution.
Comput. Phys. Commun., 2017
2013
Fast plane wave density functional theory molecular dynamics calculations on multi-GPU machines.
J. Comput. Phys., 2013
The analysis of a plane wave pseudopotential density functional theory code on a GPU machine.
Comput. Phys. Commun., 2013
2011
Large scale plane wave pseudopotential density functional theory calculations on GPU clusters.
Proceedings of the Conference on High Performance Computing Networking, 2011