Huanqi Cao

Orcid: 0000-0002-3870-106X

According to our database1, Huanqi Cao authored at least 18 papers between 2017 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer.
Parallel Comput., 2024

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.
CoRR, 2024

AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid.
Proc. ACM Program. Lang., October, 2023

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs.
ACM Trans. Storage, May, 2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.
CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.
CoRR, 2023


2022
Design and Implementation of ShenWei Universal C/C++.
CoRR, 2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver.
CoRR, 2022

Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million Cores.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Scaling graph traversal to 281 trillion edges with 40 million cores.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

2021
Chukonu: A Fully-Featured Big Data Processing System by Efficiently Integrating a Native Compute Engine into Spark.
Proc. VLDB Endow., 2021

CPM: A large-scale generative Chinese Pre-trained language model.
AI Open, 2021

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2019
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2017
A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.
Proceedings of the 3rd International Conference on Communication and Information Processing, 2017


  Loading...