2025

CSA-FCN: Channel- and Spatial-Gated Attention Mechanism Based Fully Complex-Valued Neural Network for System Matrix Calibration in Magnetic Particle Imaging.

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Computational Imaging, 2025

2024

Monotone Accelerated Proximal Gradient Network For Bioluminescence Tomography Reconstruction.

[DOI]

,

,

,

,

,

Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

FMT-ReconNet: Fluorescence Molecular Tomography Reconstruction using Prior Knowledge and Deformation Neural Network.

[DOI]

,

,

,

,

,

Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024

2023

E-Booster: A Field-Programmable Gate Array-Based Accelerator for Secure Tree Boosting Using Additively Homomorphic Encryption.

[DOI]

,

,

,

Zhenxiang Zhang

,

,

,

,

,

,

,

,

,

,

IEEE Micro, 2023

Multi-target reconstruction of fluorescence molecular tomography based on blind source separation.

[DOI]

,

,

,

,

,

,

Proceedings of the Medical Imaging 2023: Image Processing, 2023

Based on model-driven fast iterative shrinkage thresholding network for bioluminescence tomography reconstruction.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Medical Imaging 2023: Image Processing, 2023

DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization.

[DOI]

,

,

,

Hongzhong Zheng

,

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Structure-fused deep 3D hierarchical network: A bioluminescence tomography scheme for different imaging objects.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

2022

Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture.

[DOI]

,

,

,

,

,

IEEE Trans. Computers, 2022

EPQuant: A Graph Neural Network compression approach based on product quantization.

[DOI]

,

,

,

,

Hongzhong Zheng

,

,

Neurocomputing, 2022

Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network.

[DOI]

,

,

,

,

,

Hongzhong Zheng

,

IEEE Access, 2022

184QPS/W 64Mb/mm<sup>2</sup>3D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Hongzhong Zheng

,

,

,

,

,

,

,

Proceedings of the IEEE International Solid-State Circuits Conference, 2022

Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

Hongzhong Zheng

,

Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

2021

DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads.

[DOI]

,

,

,

,

Hongzhong Zheng

,

Krishna T. Malladi

,

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.

[DOI]

,

,

,

,

,

,

Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications.

[DOI]

,

,

,

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021

L1-L2 Minimization Via A Proximal Operator For Fluorescence Molecular Tomography.

[DOI]

,

,

,

,

,

,

Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

End-To-End Bioluminescence Tomography Reconstruction Based On Convolution Neural Network Scheme.

[DOI]

,

,

,

,

Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021

2020

NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs.

[DOI]

,

,

,

,

,

ACM Trans. Archit. Code Optim., 2020

Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

IEEE J. Solid State Circuits, 2020

GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs.

[DOI]

,

,

,

,

,

,

CoRR, 2020

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Boosting Deep Neural Network Efficiency with Dual-Module Inference.

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

NEST: DIMM based Near-Data-Processing Accelerator for K-mer Counting.

[DOI]

,

Krishna T. Malladi

,

,

,

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU.

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators.

[DOI]

Marzieh Lenjani

,

Patricia Gonzalez-Guerrero

,

Elaheh Sadredini

,

,

,

,

,

,

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints.

[DOI]

,

,

,

,

,

,

,

,

,

Timothy Sherwood

,

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.

[DOI]

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2019

Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints.

[DOI]

,

,

,

,

,

,

,

,

Timothy Sherwood

,

CoRR, 2019

NNBench-X: Benchmarking and Understanding Neural Network Workloads for Accelerator Designs.

[DOI]

,

,

,

,

,

IEEE Comput. Archit. Lett., 2019

Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm.

[DOI]

,

,

,

,

,

Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019

Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators.

[DOI]

,

,

,

,

Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019

Memory Trojan Attack on Neural Network Accelerators.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Near-Data Acceleration of Privacy-Preserving Biomarker Search with 3D-Stacked Memory.

[DOI]

Alvin Oliver Glova

,

,

,

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Memory-Bound Proof-of-Work Acceleration for Blockchain Applications.

[DOI]

,

,

,

,

,

,

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture.

[DOI]

,

,

,

,

,

,

,

Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

AERIS: area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019

2018

Memory-Centric Architectures: Bridging the Gap Between Compute and Memory.

[DOI]

PhD thesis, 2018

Securing Emerging Nonvolatile Main Memory With Fast and Energy-Efficient AES In-Memory Implementation.

[DOI]

,

,

Alvin Oliver Glova

,

,

IEEE Trans. Very Large Scale Integr. Syst., 2018

In-memory multiplication engine with SOT-MRAM based stochastic computing.

[DOI]

,

,

,

,

,

CoRR, 2018

Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads.

[DOI]

,

,

,

,

IEEE Comput. Archit. Lett., 2018

SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator.

[DOI]

,

Alvin Oliver Glova

,

,

,

,

Krishna T. Malladi

,

Hongzhong Zheng

,

,

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Persistence Parallelism Optimization: A Holistic Approach from Memory Bus to RDMA Network.

[DOI]

,

Matheus Ogleari

,

,

,

,

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

GraphIA: an in-situ accelerator for large-scale graph processing.

[DOI]

,

,

,

,

Proceedings of the International Symposium on Memory Systems, 2018

AIM: Fast and energy-efficient AES in-memory implementation for emerging non-volatile main memory.

[DOI]

,

,

Alvin Oliver Glova

,

,

,

Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

RADAR: a 3D-reRAM based DNA alignment accelerator architecture.

[DOI]

,

,

,

Proceedings of the 55th Annual Design Automation Conference, 2018

Cost-efficient 3D Integration to Hinder Reverse Engineering During and After Manufacturing.

[DOI]

,

,

Prashansa Mukim

,

,

Proceedings of the Asian Hardware Oriented Security and Trust Symposium, 2018

2017

DRISA: a DRAM-based reconfigurable in-situ accelerator.

[DOI]

,

,

Krishna T. Malladi

,

Hongzhong Zheng

,

,

Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Building energy-efficient multi-level cell STT-RAM caches with data compression.

[DOI]

,

,

,

,

Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016

A C2RTL Framework Supporting Partition, Parallelization, and FIFO Sizing for Streaming Applications.

[DOI]

,

,

,

Xiaobo Sharon Hu

,

,

,

,

ACM Trans. Design Autom. Electr. Syst., 2016

Nonvolatile Processor Architectures: Efficient, Reliable Progress with Unstable Power.

[DOI]

,

,

Karthik Swaminathan

,

,

,

,

,

John (Jack) Morgan Sampson

,

Vijaykrishnan Narayanan

IEEE Micro, 2016

NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.

[DOI]

,

,

,

,

,

,

,

Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016

NVSim-CAM: a circuit-level simulator for emerging nonvolatile memory based content-addressable memory.

[DOI]

,

,

,

,

Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Leveraging 3D Technologies for Hardware Security: Opportunities and Challenges.

[DOI]

,

,

,

,

,

,

Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016

Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories.

[DOI]

,

,

,

,

,

Proceedings of the 53rd Annual Design Automation Conference, 2016

Architecture design with STT-RAM: Opportunities and challenges.

[DOI]

,

,

,

,

Seung-Hyuk Kang

,

Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015

Nonvolatile Processor Architecture Exploration for Energy-Harvesting Applications.

[DOI]

,

,

,

,

John (Jack) Morgan Sampson

,

,

Vijaykrishnan Narayanan

IEEE Micro, 2015

Leveraging nonvolatility for architecture design with emerging NVM.

[DOI]

,

,

,

Kwang-Ting Cheng

,

Proceedings of the IEEE Non-Volatile Memory System and Applications Symposium, 2015

Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations.

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Architecture exploration for ambient energy harvesting nonvolatile processors.

[DOI]

,

,

,

Karthik Swaminathan

,

,

,

,

,

Vijaykrishnan Narayanan

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Ambient energy harvesting nonvolatile processors: from circuit to system.

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 52nd Annual Design Automation Conference, 2015

Nonvolatile memory allocation and hierarchy optimization for high-level synthesis.

[DOI]

,

,

,

,

Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014

PaCC: A Parallel Compare and Compress Codec for Area Reduction in Nonvolatile Processors.

[DOI]

,

,

,

,

,

Mei-Fang Chiang

,

,

Xiaobo Sharon Hu

,

IEEE Trans. Very Large Scale Integr. Syst., 2014

Intra-task scheduling for storage-less and converter-less solar-powered nonvolatile sensor nodes.

[DOI]

,

,

,

,

Xiaobo Sharon Hu

,

Proceedings of the 32nd IEEE International Conference on Computer Design, 2014

2013

Utilizing voltage-frequency islands in C-to-RTL synthesis for streaming applications.

[DOI]

,

,

,

Xiaobo Sharon Hu

,

Proceedings of the Design, Automation and Test in Europe, 2013

Optimal partition with block-level parallelization in C-to-RTL synthesis for streaming applications.

[DOI]

,

,

Xiaobo Sharon Hu

,

,

,

,

Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013

2012

A 3us wake-up time nonvolatile processor based on ferroelectric flip-flops.

[DOI]

,

,

,

,

,

Mei-Fang Chiang

,

,

,

Proceedings of the 38th European Solid-State Circuit conference, 2012

A compression-based area-efficient recovery architecture for nonvolatile processors.

[DOI]

,

,

,

,

,

,

Mei-Fang Chiang

,

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

A hierarchical C2RTL framework for FIFO-connected stream applications.

[DOI]

,

,

,

,

,

Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012

2011

An Energy Efficient Sensor Network Processor with Latency-Aware Adaptive Compression.

[DOI]

,

,

,

,

IEICE Trans. Electron., 2011