Huanqi Cao
Orcid: 0000-0002-3870-106X
According to our database1,
Huanqi Cao
authored at least 18 papers
between 2017 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Extending the limit of LR-TDDFT on two different approaches: Numerical algorithms and new Sunway heterogeneous supercomputer.
Parallel Comput., 2024
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention.
CoRR, 2024
AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured Grid.
Proc. ACM Program. Lang., October, 2023
TriCache: A User-Transparent Block Cache Enabling High-Performance Out-of-Core Processing with In-Memory Programs.
ACM Trans. Storage, May, 2023
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR.
CoRR, 2023
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
2022
Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver.
CoRR, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022
2021
Chukonu: A Fully-Featured Big Data Processing System by Efficiently Integrating a Native Compute Engine into Spark.
Proc. VLDB Endow., 2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
2019
T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
2017
A hierarchical grid algorithm for accelerating high-performance conjugate gradient benchmark on sunway many-core processor.
Proceedings of the 3rd International Conference on Communication and Information Processing, 2017