Peng Zhang

Orcid: 0000-0001-8364-9793

Affiliations:
  • National University of Defense Technology, Software Institute, College of Computer, Compiler Laboratory, Changsha, China


According to our database1, Peng Zhang authored at least 16 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
thSORT: an efficient parallel sorting algorithm on multi-core DSPs.
CCF Trans. High Perform. Comput., October, 2024

Optimizing General Matrix Multiplications on Modern Multi-core DSPs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Optimizing Stencil Computation on Multi-core DSPs.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

Optimizing SpMV on Heterogeneous Multi-Core DSPs through Improved Locality and Vectorization.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

2023
Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000.
Frontiers Inf. Technol. Electron. Eng., 2023

Optimizing Direct Convolutions on ARM Multi-Cores.
Proceedings of the International Conference for High Performance Computing, 2023

The Optimization of Multi-physics Application Simulated by Lattice Boltzmann Method Based on Domestic Processors.
Proceedings of the 2nd International Conference on Networks, 2023

2021
Large-Scale Parallel Alignment Algorithm for SMRT Reads.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2021

2020
Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures.
IEEE Trans. Parallel Distributed Syst., 2020

Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures: A Machine Learning Based Approach.
CoRR, 2020

2019
The Communication-Overlapped Hybrid Decomposition Parallel Algorithm for Multi-Scale Fluid Simulations.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Tuning Streamed Applications on Intel Xeon Phi: A Machine Learning Based Approach.
CoRR, 2018

Auto-tuning Streamed Applications on Intel Xeon Phi.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

MOCL: an efficient openCL implementation for the matrix-2000 architecture.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018

2017
Implementing and Evaluating OpenCL on an ARMv8 Multi-Core CPU.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017

2016
Evaluating Multiple Streams on Heterogeneous Platforms.
Parallel Process. Lett., 2016


  Loading...