2025
CSA-FCN: Channel- and Spatial-Gated Attention Mechanism Based Fully Complex-Valued Neural Network for System Matrix Calibration in Magnetic Particle Imaging.
IEEE Trans. Computational Imaging, 2025
2024
Monotone Accelerated Proximal Gradient Network For Bioluminescence Tomography Reconstruction.
Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024
FMT-ReconNet: Fluorescence Molecular Tomography Reconstruction using Prior Knowledge and Deformation Neural Network.
Proceedings of the 46th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2024
2023
E-Booster: A Field-Programmable Gate Array-Based Accelerator for Secure Tree Boosting Using Additively Homomorphic Encryption.
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE Micro, 2023
Multi-target reconstruction of fluorescence molecular tomography based on blind source separation.
Proceedings of the Medical Imaging 2023: Image Processing, 2023
Based on model-driven fast iterative shrinkage thresholding network for bioluminescence tomography reconstruction.
Proceedings of the Medical Imaging 2023: Image Processing, 2023
DF-GAS: a Distributed FPGA-as-a-Service Architecture towards Billion-Scale Graph-based Approximate Nearest Neighbor Search.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
TT-GNN: Efficient On-Chip Graph Neural Network Training via Embedding Reformation and Hardware Optimization.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Structure-fused deep 3D hierarchical network: A bioluminescence tomography scheme for different imaging objects.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023
2022
Efficient Processing of Sparse Tensor Decomposition via Unified Abstraction and PE-Interactive Architecture.
IEEE Trans. Computers, 2022
EPQuant: A Graph Neural Network compression approach based on product quantization.
Neurocomputing, 2022
Practical Near-Data-Processing Architecture for Large-Scale Distributed Graph Neural Network.
IEEE Access, 2022
184QPS/W 64Mb/mm<sup>2</sup>3D Logic-to-DRAM Hybrid Bonding with Process-Near-Memory Engine for Recommendation System.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
Hyperscale FPGA-as-a-service architecture for large-scale distributed graph neural network.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
2021
DLUX: A LUT-Based Near-Bank Accelerator for Data Center Deep Learning Training Workloads.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021
Overcoming the Memory Hierarchy Inefficiencies in Graph Processing Applications.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2021
L1-L2 Minimization Via A Proximal Operator For Fluorescence Molecular Tomography.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021
End-To-End Bioluminescence Tomography Reconstruction Based On Convolution Neural Network Scheme.
Proceedings of the 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2021
2020
NNBench-X: A Benchmarking Methodology for Neural Network Accelerator Designs.
ACM Trans. Archit. Code Optim., 2020
Tianjic: A Unified and Scalable Chip Bridging Spike-Based and Continuous Neural Computation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
IEEE J. Solid State Circuits, 2020
GNNAdvisor: An Efficient Runtime System for GNN Acceleration on GPUs.
CoRR, 2020
DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Boosting Deep Neural Network Efficiency with Dual-Module Inference.
Proceedings of the 37th International Conference on Machine Learning, 2020
NEST: DIMM based Near-Data-Processing Accelerator for K-mer Counting.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU.
Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020
Fulcrum: A Simplified Control and Access Mechanism Toward Flexible and Practical In-Situ Accelerators.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020
DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints.
,
,
,
,
,
,
,
,
,
,
Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020
2019
Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.
,
,
,
,
,
,
,
,
,
,
IEEE Trans. Parallel Distributed Syst., 2019
Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints.
CoRR, 2019
NNBench-X: Benchmarking and Understanding Neural Network Workloads for Accelerator Designs.
IEEE Comput. Archit. Lett., 2019
Alleviating Irregularity in Graph Analytics Acceleration: a Hardware/Software Co-Design Approach.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
MEDAL: Scalable DIMM based Near Data Processing Accelerator for DNA Seeding Algorithm.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Balancing Memory Accesses for Energy-Efficient Graph Analytics Accelerators.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, 2019
Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019
CNNWire: Boosting Convolutional Neural Network with Winograd on ReRAM based Accelerators.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019
Memory Trojan Attack on Neural Network Accelerators.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019
Near-Data Acceleration of Privacy-Preserving Biomarker Search with 3D-Stacked Memory.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019
Memory-Bound Proof-of-Work Acceleration for Blockchain Applications.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
FPSA: A Full System Stack Solution for Reconfigurable ReRAM-based NN Accelerator Architecture.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019
AERIS: area/energy-efficient 1T2R ReRAM based processing-in-memory neural network system-on-a-chip.
Proceedings of the 24th Asia and South Pacific Design Automation Conference, 2019
2018
Memory-Centric Architectures: Bridging the Gap Between Compute and Memory.
PhD thesis, 2018
Securing Emerging Nonvolatile Main Memory With Fast and Energy-Efficient AES In-Memory Implementation.
IEEE Trans. Very Large Scale Integr. Syst., 2018
In-memory multiplication engine with SOT-MRAM based stochastic computing.
CoRR, 2018
Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads.
IEEE Comput. Archit. Lett., 2018
SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
Persistence Parallelism Optimization: A Holistic Approach from Memory Bus to RDMA Network.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018
GraphIA: an in-situ accelerator for large-scale graph processing.
Proceedings of the International Symposium on Memory Systems, 2018
AIM: Fast and energy-efficient AES in-memory implementation for emerging non-volatile main memory.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
RADAR: a 3D-reRAM based DNA alignment accelerator architecture.
Proceedings of the 55th Annual Design Automation Conference, 2018
Cost-efficient 3D Integration to Hinder Reverse Engineering During and After Manufacturing.
Proceedings of the Asian Hardware Oriented Security and Trust Symposium, 2018
2017
DRISA: a DRAM-based reconfigurable in-situ accelerator.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Building energy-efficient multi-level cell STT-RAM caches with data compression.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017
2016
A C2RTL Framework Supporting Partition, Parallelization, and FIFO Sizing for Streaming Applications.
ACM Trans. Design Autom. Electr. Syst., 2016
Nonvolatile Processor Architectures: Efficient, Reliable Progress with Unstable Power.
IEEE Micro, 2016
NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory.
Proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture, 2016
NVSim-CAM: a circuit-level simulator for emerging nonvolatile memory based content-addressable memory.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Leveraging 3D Technologies for Hardware Security: Opportunities and Challenges.
Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016
Pinatubo: a processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories.
Proceedings of the 53rd Annual Design Automation Conference, 2016
Architecture design with STT-RAM: Opportunities and challenges.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
Nonvolatile Processor Architecture Exploration for Energy-Harvesting Applications.
IEEE Micro, 2015
Leveraging nonvolatility for architecture design with emerging NVM.
Proceedings of the IEEE Non-Volatile Memory System and Applications Symposium, 2015
Leveraging emerging nonvolatile memory in high-level synthesis with loop transformations.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015
Architecture exploration for ambient energy harvesting nonvolatile processors.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Ambient energy harvesting nonvolatile processors: from circuit to system.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 52nd Annual Design Automation Conference, 2015
Nonvolatile memory allocation and hierarchy optimization for high-level synthesis.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
2014
PaCC: A Parallel Compare and Compress Codec for Area Reduction in Nonvolatile Processors.
IEEE Trans. Very Large Scale Integr. Syst., 2014
Intra-task scheduling for storage-less and converter-less solar-powered nonvolatile sensor nodes.
Proceedings of the 32nd IEEE International Conference on Computer Design, 2014
2013
Utilizing voltage-frequency islands in C-to-RTL synthesis for streaming applications.
Proceedings of the Design, Automation and Test in Europe, 2013
Optimal partition with block-level parallelization in C-to-RTL synthesis for streaming applications.
Proceedings of the 18th Asia and South Pacific Design Automation Conference, 2013
2012
A 3us wake-up time nonvolatile processor based on ferroelectric flip-flops.
Proceedings of the 38th European Solid-State Circuit conference, 2012
A compression-based area-efficient recovery architecture for nonvolatile processors.
Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012
A hierarchical C2RTL framework for FIFO-connected stream applications.
Proceedings of the 17th Asia and South Pacific Design Automation Conference, 2012
2011
An Energy Efficient Sensor Network Processor with Latency-Aware Adaptive Compression.
IEICE Trans. Electron., 2011