Huanqi Cao

Orcid: 0000-0002-3870-106X

According to our database¹, Huanqi Cao authored at least 18 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer.

[BibT_eX]

[DOI]

Parallel Comput., 2024

SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.

[BibT_eX]

[DOI]

CoRR, 2024

AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid.

[BibT_eX]

[DOI]

Proc. ACM Program. Lang., October, 2023

TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs.

[BibT_eX]

[DOI]

ACM Trans. Storage, May, 2023

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.

[BibT_eX]

[DOI]

CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.

[BibT_eX]

[DOI]

CoRR, 2023

RWKV: Reinventing RNNs for the Transformer Era.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022

Design and Implementation of ShenWei Universal C/C++.

[BibT_eX]

[DOI]

Huanqi Cao

Jiajie Chen

CoRR, 2022

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver.

[BibT_eX]

[DOI]

CoRR, 2022

Scaling Graph 500 SSSP to 140 Trillion Edges with over 40 Million Cores.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

BaGuaLu: targeting brain scale pretrained models with over 37 million cores.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Scaling graph traversal to 281 trillion edges with 40 million cores.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

2021

Chukonu: A Fully-Featured Big Data Processing System by Efficiently Integrating a Native Compute Engine into Spark.

[BibT_eX]

[DOI]

Proc. VLDB Endow., 2021

CPM: A large-scale generative Chinese Pre-trained language model.

[BibT_eX]

[DOI]

AI Open, 2021

Sparker: Efficient Reduction for More Scalable Machine Learning with Spark.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.

[BibT_eX]

[DOI]

Nitish Kumar Srivastava

Christopher J. Hughes

Timothy G. Mattson

Pradeep Dubey

Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

2017

A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Communication and Information Processing, 2017

Huanqi Cao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...