Hongzhong Zheng

Orcid: 0000-0001-7696-9799

According to our database1, Hongzhong Zheng authored at least 47 papers between 2007 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Efficient Super-Resolution System With Block-Wise Hybridization and Quantized Winograd on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., November, 2023

Accelerating Distributed GNN Training by Codes.
IEEE Trans. Parallel Distributed Syst., September, 2023

MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing.
ACM Trans. Archit. Code Optim., September, 2023

NPS: A Framework for Accurate Program Sampling Using Graph Neural Network.
CoRR, 2023

TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

ArchExplorer: Microarchitecture Exploration Via Bottleneck Analysis.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Klotski: DNN Model Orchestration Framework for Dataflow Architecture Accelerators.
Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 2023

Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
EPQuant: A Graph Neural Network compression approach based on product quantization.
Neurocomputing, 2022

Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network.
IEEE Access, 2022

Accelerating CPU-Based Sparse General Matrix Multiplication With Binary Row Merging.
IEEE Access, 2022

OpSparse: A Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs.
IEEE Access, 2022

COMB-MCM: Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

184QPS/W 64Mb/mm<sup>2</sup>3D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio.
Proceedings of the 28th IEEE International Conference on Parallel and Distributed Systems, 2022

Enabling High-Quality Uncertainty Quantification in a PIM Designed for Bayesian Neural Network.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
STAR: Synthesis of Stateful Logic in RRAM Targeting High Area Utilization.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Area Efficient Pattern Representation of Binary Neural Networks on RRAM.
J. Comput. Sci. Technol., 2021

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing.
CoRR, 2021

2020
GNN-PIM: A Processing-in-Memory Architecture for Graph Neural Networks.
Proceedings of the Advanced Computer Architecture - 13th Conference, 2020

2019
CoNDA: efficient cache coherence support for near-data accelerators.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

2018
SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Performance Impact of Emerging Memory Technologies on Big Data Applications: A Latency-Programmable System Emulation Approach.
Proceedings of the 2018 on Great Lakes Symposium on VLSI, 2018

2017
DRAF: A Low-Power DRAM-Based Reconfigurable Acceleration Fabric.
IEEE Micro, 2017

LazyPIM: Efficient Support for Cache Coherence in Processing-in-Memory Architectures.
CoRR, 2017

LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory.
IEEE Comput. Archit. Lett., 2017

Architecting HBM as a high bandwidth, high capacity, self-managed last-level cache.
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2017

FlashStorageSim: Performance Modeling for SSD Architectures.
Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

DRISA: a DRAM-based reconfigurable in-situ accelerator.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

2016
MEMRES: A Fast Memory System Reliability Simulator.
IEEE Trans. Reliab., 2016

Software-Defined Emulation Infrastructure for High Speed Storage.
Proceedings of the 9th ACM International on Systems and Storage Conference, 2016

DRAMScale: Mechanisms to Increase DRAM Capacity.
Proceedings of the Second International Symposium on Memory Systems, 2016

DRAMPersist: Making DRAM Systems Persistent.
Proceedings of the Second International Symposium on Memory Systems, 2016

FlexDrive: A Framework to Explore NVMe Storage Solutions.
Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

2015
FAME: A Fast and Accurate Memory Emulator for New Memory System Architecture Exploration.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

2014
Mini-Rank: A Power-EfficientDDRx DRAM Memory Architecture.
IEEE Trans. Computers, 2014

2013
Thermal Modeling and Management of DRAM Systems.
IEEE Trans. Computers, 2013

2010
Power and Performance Trade-Offs in Contemporary DRAM System Designs for Multicore Processors.
IEEE Trans. Computers, 2010

Heterogeneous Mini-rank: Adaptive, Power-Efficient Memory Architecture.
Proceedings of the 39th International Conference on Parallel Processing, 2010

2009
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices.
Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

2008
Software thermal management of dram memory for multicore systems.
Proceedings of the 2008 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2008

Mini-rank: Adaptive DRAM architecture for improving memory power efficiency.
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008

Memory Access Scheduling Schemes for Systems with Multi-Core Processors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

2007
DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Thermal modeling and management of DRAM memory systems.
Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007


  Loading...