Junzhong Shen

Orcid: 0000-0001-6233-6800

According to our database1, Junzhong Shen authored at least 27 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ABS: Accumulation Bit-Width Scaling Method for Designing Low-Precision Tensor Core.
IEEE Trans. Very Large Scale Integr. Syst., September, 2024

HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Enhancing the PE Utilization for Multi-Precision Systolic Array via Optimizing Computation Latency.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2024

MACO: Exploring GEMM Acceleration on a Loosely-Coupled Multi-Core Processor.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

BitShare: An Efficient Precision-Scalable Accelerator with Combining-Like-Terms GEMM.
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024

2022
TILE-SIM: A Systematic Approach to Systolic Array-based Accelerator Evaluation.
Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

S-SIM: A Simulator for Systolic Array-based DNN Accelerators with Tile Access Awareness.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2022

Mentha: Enabling Sparse-Packing Computation on Systolic Arrays.
Proceedings of the 51st International Conference on Parallel Processing, 2022

BP-Im2col: Implicit Im2col Supporting AI Backpropagation on Systolic Arrays.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022

MZ Core: An Enhanced Matrix Acceleration Engine for HPC/ AI Applications.
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022

2021
SAI: Self-Adjusting Incremental Quantile Estimation for Sparse Training of Neural Networks on Hardware Accelerators.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Embrace the Conflicts: Exploring the Integration of Single Port Memory in Systolic Array-based Accelerators.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

2020
Toward an Efficient Deep Pipelined Template-Based Architecture for Accelerating the Entire 2-D and 3-D CNNs on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

P4 to FPGA-A Fast Approach for Generating Efficient Network Processors.
IEEE Access, 2020

Towards a Deep-Pipelined Architecture for Accelerating Deep GCN on a Multi-FPGA Platform.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

Scalable FPGA-based Architecture for High-Performance Per-Flow Traffic Measurement.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Towards Memory-Efficient Streaming Processing with Counter-Cascading Sketching on FPGA.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

2019
Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

An Efficient Design Flow for Accelerating Complicated-connected CNNs on a Multi-FPGA Platform.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Accelerating 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Scale-out Acceleration for 3D CNN-based Lung Nodule Segmentation on a Multi-FPGA System.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018
MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA.
IEICE Electron. Express, 2018

Towards a Multi-array Architecture for Accelerating Large-scale Matrix Multiplication on FPGAs.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

2017
FPGA-accelerated deep convolutional neural networks for high throughput and energy efficiency.
Concurr. Comput. Pract. Exp., 2017

Optimizing OpenCL Implementation of Deep Convolutional Neural Network on FPGA.
Proceedings of the Network and Parallel Computing, 2017

2015
Unified Virtual Memory Support for Deep CNN Accelerator on SoC FPGA.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2015


  Loading...