Guiming Wu

Orcid: 0000-0002-6703-3195

According to our database1, Guiming Wu authored at least 26 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Acceleration of Multi-Body Molecular Dynamics With Customized Parallel Dataflow.
IEEE Trans. Parallel Distributed Syst., December, 2024

MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof.
IACR Cryptol. ePrint Arch., 2024

MSMAC: Accelerating Multi-Scalar Multiplication for Zero-Knowledge Proof.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Topgun: An ECC Accelerator for Private Set Intersection.
ACM Trans. Reconfigurable Technol. Syst., December, 2023

E-Booster: A Field-Programmable Gate Array-Based Accelerator for Secure Tree Boosting Using Additively Homomorphic Encryption.
IEEE Micro, 2023

A High-Performance Hardware Architecture for ECC Point Multiplication over Curve25519.
Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

FPGA应用于高性能计算的研究现状和未来挑战 (Research Advances and Future Challenges of FPGA-based High Performance Computing).
计算机科学, 2019

A High-Performance Accelerator for Floating-Point Matrix Multiplication.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

GF(2m)上椭圆曲线标量乘的硬件结构实现 (Hardware Implementation of Scalar Multiplication on Elliptic Curves over GF(2m)).
计算机科学, 2015

面向定制结构的稀疏矩阵分块方法 (Sparse Matrix Blocking Method for Custom Architecture).
计算机科学, 2015

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme.
IEICE Electron. Express, 2015

High-Performance Architecture for the Conjugate Gradient Solver on FPGAs.
IEEE Trans. Circuits Syst. II Express Briefs, 2013

A High Performance and Memory Efficient LU Decomposer on FPGAs.
IEEE Trans. Computers, 2012

Parallelizing sparse LU decomposition on FPGAs.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

A Unified Co-Processor Architecture for Matrix Decomposition.
J. Comput. Sci. Technol., 2010

FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing.
Proceedings of the 24th International Conference on Supercomputing, 2010

Automatic synthesis of processor arrays with local memories on FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2010

High performance and memory efficient implementation of matrix multiplication on FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Blocking LU Decomposition for FPGAs.
Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2010

A coarse-grained reconfigurable computing architecture with loop self-pipelining.
Sci. China Ser. F Inf. Sci., 2009

Exploiting Fine-Grained Pipeline Parallelism for Wavefront Computations on Multicore Platforms.
Proceedings of the ICPPW 2009, 2009

A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs.
Proceedings of the FCCM 2009, 2009

Computation rotating for data reuse.
Proceedings of the 13th Asia-Pacific Computer Systems Architecture Conference, 2008

Instruction Selection for Subword Level Parallelism Optimizations for Application Specific Instruction Processors.
Proceedings of the Parallel and Distributed Processing and Applications, 2007

The Implementation of a Coarse-Grained Reconfigurable Architecture with Loop Self-pipelining.
Proceedings of the Reconfigurable Computing: Architectures, 2007

Designing a Coarse-Grained Reconfigurable Architecture Using Loop Self-Pipelining.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
