Peng Chen

Orcid: 0000-0003-1244-3151

Affiliations:
  • National Institute of Advanced Industrial Science and Technology, Japan, RIKEN Center for Computational Science, Tokyo, Japan
  • Tokyo Institute of Technology, AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Japan (PhD 2020)


According to our database1, Peng Chen authored at least 22 papers between 2018 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Evolutionary Architecture Search for Generative Adversarial Networks Based on Weight Sharing.
IEEE Trans. Evol. Comput., June, 2024

Adaptive Patching for High-resolution Image Segmentation with Transformers.
CoRR, 2024

Real-time High-resolution X-Ray Computed Tomography.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Communication Optimization for Distributed GCN Training on ABCI Supercomputer.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Investigating Nvidia GPU Architecture Trends via Microbenchmarks.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Asynchronous I/O Optimization for X-Ray Imaging via GPUDirect Storage.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Simeuro: A Hybrid CPU-GPU Parallel Simulator for Neuromorphic Computing Chips.
IEEE Trans. Parallel Distributed Syst., October, 2023

Ultra-Long Sequence Distributed Transformer.
CoRR, 2023

Revisiting Temporal Blocking Stencil Optimizations.
Proceedings of the 37th International Conference on Supercomputing, 2023

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Proceedings of the 37th International Conference on Supercomputing, 2023

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt).
Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 2023

2022
Automatic Generation of High-Performance Convolution Kernels on ARM CPUs for Deep Learning.
IEEE Trans. Parallel Distributed Syst., 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

Persistent Kernels for Iterative Memory-bound GPU Applications.
CoRR, 2022

Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic Reconstruction.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

2021
Scalable FBP decomposition for cone-beam CT reconstruction.
Proceedings of the International Conference for High Performance Computing, 2021

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Performance portable back-projection algorithms on CPUs: agnostic data locality and vectorization optimizations.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2019
iFDK: a scalable framework for instant high-resolution image reconstruction.
Proceedings of the International Conference for High Performance Computing, 2019

A versatile software systolic execution model for GPU memory-bound kernels.
Proceedings of the International Conference for High Performance Computing, 2019

2018
Efficient Algorithms for the Summed Area Tables Primitive on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018


  Loading...