30.2 A 22nm 0.26nW/Synapse Spike-Driven Spiking Neural Network Processing Unit Using Time-Step-First Dataflow and Sparsity-Adaptive In-Memory Computing.

[BibT_eX]

[DOI]

Ying Liu

Yufei Ma

Proceedings of the IEEE International Solid-State Circuits Conference, 2024

AIG-CIM: A Scalable Chiplet Module with Tri-Gear Heterogeneous Compute-in-Memory for Diffusion Acceleration.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

An In-Memory Computing Accelerator with Reconfigurable Dataflow for Multi-Scale Vision Transformer with Hybrid Topology.

[BibT_eX]

[DOI]

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

S2D-CIM: A 22nm 128Kb Systolic Digital Compute-in-Memory Macro with Domino Data Path for Flexible Vector Operation and 2-D Weight Update in Edge AI Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

Quartet: A 22nm 0.09mJ/lnference Digital Compute-in-Memory Versatile AI Accelerator with Heterogeneous Tensor Engines and Off-Chip-Less Dataflow.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2024

2023

Research progress on low-power artificial intelligence of things (AIoT) chip design.

[BibT_eX]

[DOI]

Sci. China Inf. Sci., October, 2023

An 82-nW 0.53-pJ/SOP Clock-Free Spiking Neural Network With 40-μs Latency for AIoT Wake-Up Functions Using a Multilevel-Event-Driven Bionic Architecture and Computing-in-Memory Technique.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2023

A 22nm Delta-Sigma Computing-In-Memory (Δ∑CIM) SRAM Macro with Near-Zero-Mean Outputs and LSB-First ADCs Achieving 21.38TOPS/W for 8b-MAC Edge AI Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Solid- State Circuits Conference, 2023

DCIM-3DRec: A 3D Reconstruction Accelerator with Digital Computing-in-Memory and Octree-Based Scheduler.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2023

A Model-Specific End-to-End Design Methodology for Resource-Constrained TinyML Hardware.

[BibT_eX]

[DOI]

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A A 22nm 0.43pJ/SOP Sparsity-Aware In-Memory Neuromorphic Computing System with Hybrid Spiking and Artificial Neural Network and Configurable Topology.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2023

RIMAC: An Array-Level ADC/DAC-Free ReRAM-Based in-Memory DNN Processor with Analog Cache and Computation.

[BibT_eX]

[DOI]

Proceedings of the 28th Asia and South Pacific Design Automation Conference, 2023

2022

A Flexible and Efficient FPGA Accelerator for Various Large-Scale and Lightweight CNNs.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs.

[BibT_eX]

[DOI]

Zhiyuan Chen

Yufei Ma

Zhongfeng Wang

IEEE Trans. Circuits Syst. I Regul. Pap., 2022

DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks.

[BibT_eX]

[DOI]

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

2021

SWIFT: Small-World-based Structural Pruning to Accelerate DNN Inference on FPGA.

[BibT_eX]

[DOI]

Proceedings of the FPGA '21: The 2021 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Virtual Event, USA, February 28, 2021

2020

Performance Modeling for CNN Inference Accelerators on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Efficient Inference of Large-Scale and Lightweight Convolutional Neural Networks on FPGA.

[BibT_eX]

[DOI]

Xiao Wu

Yufei Ma

Zhongfeng Wang

Proceedings of the 33rd IEEE International System-on-Chip Conference, 2020

Efficient Hardware Post Processing of Anchor-Based Object Detection on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE Computer Society Annual Symposium on VLSI, 2020

Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks.

[BibT_eX]

[DOI]

Zhiyuan Chen

Yufei Ma

Zhongfeng Wang

Proceedings of the IEEE/ACM International Conference On Computer Aided Design, 2020

In-Memory Computing: The Next-Generation AI Computing Paradigm.

[BibT_eX]

[DOI]

Proceedings of the GLSVLSI '20: Great Lakes Symposium on VLSI 2020, 2020

An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference.

[BibT_eX]

[DOI]

Jiayu Wen

Yufei Ma

Zhongfeng Wang

Proceedings of the 2020 IEEE Asia Pacific Conference on Circuits and Systems, 2020

2019

Efficient Network Construction Through Structural Plasticity.

[BibT_eX]

[DOI]

IEEE J. Emerg. Sel. Topics Circuits Syst., 2019

Automatic Compiler Based FPGA Accelerator for CNN Training.

[BibT_eX]

[DOI]

Shreyas Kolala Venkataramanaiah

Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

2018

Hardware Acceleration of Deep Convolutional Neural Networks on FPGA.

[BibT_eX]

[DOI]

Yufei Ma

PhD thesis, 2018

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2018

ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler.

[BibT_eX]

[DOI]

Integr., 2018

Algorithm-hardware co-design of single shot detector for fast object detection on FPGAs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computer-Aided Design, 2018

2017

End-to-end scalable FPGA accelerator for deep residual networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

2016

Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Field Programmable Logic and Applications, 2016

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

2015

Energy-efficient reconstruction of compressively sensed bioelectrical signals with stochastic computing circuits.

[BibT_eX]

[DOI]

Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

Yufei Ma

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...