Zhiyi Yu

Hong Li

Jialin Feng

J. Supercomput., March, 2024

Enhancing text classification with attention matrices based on BERT.

[BibT_eX]

[DOI]

Hong Li

Jialin Feng

Expert Syst. J. Knowl. Eng., March, 2024

Enhancing aspect-based sentiment analysis with dependency-attention GCN and mutual assistance mechanism.

[BibT_eX]

[DOI]

Jialin Feng

Hong Li

J. Intell. Inf. Syst., February, 2024

An Adaptive Re-Read Strategy for Mitigating Temporary Read Errors in 3D-NAND Flash Solid-State Drives.

[BibT_eX]

[DOI]

Proceedings of the 13th Non-Volatile Memory Systems and Applications Symposium, 2024

Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

DAW-DMR: Divergence-Aware Warped DMR with Full Error Detection for GPGPU s.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2024

BafSP: Co-Design of Compute SRAM and Bit-Aware Data Flip Mitigation with In-Memory Sparsity Detection for SpMM.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2024

Sparsespikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

An Efficient Asynchronous Circuits Design Flow with Backward Delay Propagation Constraint.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023

TensorCache: Reconstructing Memory Architecture With SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., December, 2023

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, September, 2023

Parallel-Prefix Adder in Spin-Orbit Torque Magnetic RAM for High Bit-Width Non-Volatile Computation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, February, 2023

High-Reliability, Reconfigurable, and Fully Non-volatile Full-Adder Based on SOT-MTJ for Image Processing Applications.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, February, 2023

Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUs.

[BibT_eX]

[DOI]

Microelectron. J., 2023

Computing Resistance-Style Image Sensors for Artificial Neural Networks.

[BibT_eX]

[DOI]

IEEE Internet Things J., 2023

A Low Power 100-Gb/s PAM-4 Driver with Linear Distortion Compensation in 65-nm CMOS.

[BibT_eX]

[DOI]

IEICE Trans. Electron., 2023

A Low Insertion Loss Wideband Bonding-Wire Based Interconnection for 400 Gbps PAM4 Transceivers.

[BibT_eX]

[DOI]

Xiangyu Meng

Yecong Li

IEICE Trans. Electron., 2023

A 6T-3M SOT-MRAM for in-memory computing with reconfigurable arithmetic operations.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2023

Enabling energy-Efficient object detection with surrogate gradient descent in spiking neural networks.

[BibT_eX]

[DOI]

CoRR, 2023

A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023

A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

Towards Energy-Efficient Asynchronous Circuit Design with Flip-Flop-to-Latch Replacement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Integrated Circuits, 2023

Towards Efficient On-Chip Learning for Spiking Neural Networks Accelerator with Surrogate Gradient.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Integrated Circuits, 2023

A Scalable Deadlock-Free Static Routing Algorithm for Chiplet-Based Systems.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

LWSDP: Locality-Aware Warp Scheduling and Dynamic Data Prefetching Co-design in the Per-SM Private Cache of GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

CRAFT: Common Router Architecture for Throughput Optimization.

[BibT_eX]

[DOI]

Proceedings of the Algorithms and Architectures for Parallel Processing, 2023

2022

An Asynchronous Bundled-Data Template With Current Sensing Completion Detection Technique.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2022

A dual-rail/single-rail hybrid system using null convention logic circuits.

[BibT_eX]

[DOI]

Microelectron. J., 2022

An in-memory computing multiply-and-accumulate circuit based on ternary STT-MRAMs for convolutional neural networks.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2022

Hardware Based RISC-V Instruction Set Randomization.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

DFT Architecture for Click-Based Bundled-Data Asynchronous Circuits.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

3D-NWA: A Nested-Winograd Accelerator for 3D CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

One Case of THOUGHT: Industry-University Converged Education Practice on Open Source.

[BibT_eX]

[DOI]

Proceedings of the Computer Science and Education - 17th International Conference, 2022

2021

Balancing the Cost and Performance Trade-Offs in SNN Processors.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2021

A Low-Power Asynchronous RISC-V Processor With Propagated Timing Constraints Method.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2021

An MTJ-Based Asynchronous System With Extremely Fine-Grained Voltage Scaling.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2021

A Data-Driven Asynchronous Neural Network Accelerator.

[BibT_eX]

[DOI]

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Hand gesture recognition algorithm combining hand-type adaptive algorithm and effective-area ratio for efficient edge computing.

[BibT_eX]

[DOI]

J. Electronic Imaging, 2021

High-parallelism Inception-like Spiking Neural Networks for Unsupervised Feature Learning.

[BibT_eX]

[DOI]

Neurocomputing, 2021

Dimension fusion: Dimension-level dynamically composable accelerator for convolutional neural networks.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2021

OERFF: A Vehicle Re-Identification Method Based on Orientation Estimation and Regional Feature Fusion.

[BibT_eX]

[DOI]

IEEE Access, 2021

High-Throughput Zipper Encoder for 800G Optical Communication System.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

A high throughput spatially coupled low density generator matrix coding system.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

FWUA : A Flexible Winograd-Based Uniform Accelerator for 1D/2D/3D CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

3D-VNPU: A Flexible Accelerator for 2D/3D CNNs on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020

NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2020

Low-Cost Adaptive Exponential Integrate-and-Fire Neuron Using Stochastic Computing.

[BibT_eX]

[DOI]

IEEE Trans. Biomed. Circuits Syst., 2020

DM-IMCA: A dual-mode in-memory computing architecture for general purpose processing.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2020

BioSNet: A Fast-Learning and High-Robustness Unsupervised Biomimetic Spiking Neural Network.

[BibT_eX]

[DOI]

CoRR, 2020

A Low-Cost and High-Throughput NoC-Aware Chip-to-Chip Interconnection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

Spiking Inception Module for Multi-layer Unsupervised Spiking Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

A Low-Power Processing Element Based on Asynchronous Data-Driven Bit-Serial Multiplier for CNNs.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Integrated Circuits, 2020

An Asynchronous Convolution Process Engine forVGG-16 Neural Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Integrated Circuits, 2020

SPA: Stochastic Probability Adjustment for System Balance of Unsupervised SNNs.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Pattern Recognition, 2020

2019

A 68-mw 2.2 Tops/w Low Bit Width and Multiplierless DCNN Object Detection Processor for Visually Impaired People.

[BibT_eX]

[DOI]

Jinglong Xu

IEEE Trans. Circuits Syst. Video Technol., 2019

2018

A Flexible and Energy-Efficient Convolutional Neural Network Acceleration With Dedicated ISA and Accelerator.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2018

An Automatic Task Partition Method for Multi-core System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

A Reconfigurable Process Engine for Flexible Convolutional Neural Network Acceleration.

[BibT_eX]

[DOI]

Shanlin Xiao

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017

An FPGA-Based Hardware Accelerator for Traffic Sign Detection.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2017

A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2017

A multi-core-based heterogeneous parallel turbo decoder.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2017

Parallel implementations of SHA-3 on a 24-core processor with software and hardware co-design.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on ASIC, 2017

The write deduplication mechanism based on a novel low-power data latched sense amplifier for a magnetic tunnel junction based non-volatile memory.

[BibT_eX]

[DOI]

Baofa Huang

Ningyuan Yin

Proceedings of the 12th IEEE International Conference on ASIC, 2017

A fast and energy efficient FPGA-based system for real-time object tracking.

[BibT_eX]

[DOI]

Jinlong Xu

Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2015

Design and Analysis of Highly Energy/Area-Efficient Multiported Register Files With Read Word-Line Sharing Strategy in 65-nm CMOS Process.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

Many-Core Processors Granularity Evaluation by Considering Performance, Yield, and Lifetime Reliability.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

A 65 nm Cryptographic Processor for High Speed Pairing Computation.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2015

A Heterogeneous Multicore Crypto-Processor With Flexible Long-Word-Length Computation.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2015

Special Issue on Emerging Many-Core Systems for Exascale Computing.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2015

Non-binary digital calibration for split-capacitor DAC in SAR ADC.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2015

A scalable and reconfigurable 2.5D integrated multicore processor on silicon interposer.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

A configurable SoC design for information security.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

Parallel implementation of AES on 2.5D multicore platform with hardware and software co-design.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

A lifting-based 2-D discrete wavelet transform architecture for data compression of bio-potential signals.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014

An Efficient Implementation of Montgomery Multiplication on Multicore Platform With Optimized Algorithm, Task Partitioning, and Network Architecture.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2014

Low-Power Multicore Processor Design With Reconfigurable Same-Instruction Multiple Process.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. II Express Briefs, 2014

A 16-Core Processor With Shared-Memory and Message-Passing Communications.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2014

An area-efficient dual replica-bitline delay technique for process-variation-tolerant low voltage SRAM sense amplifier timing.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2014

Acceleration of Naive-Bayes algorithm on multicore processor for massive text classification.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

2013

Parallelization of Radix-2 Montgomery Multiplication on Multicore Platform.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2013

Architecture and Physical Implementation of Reconfigurable Multi-Port Physical Unclonable Functions in 65 nm CMOS.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Efficient distributed memory management in a multi-core H.264 decoder on FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Symposium on System on Chip, 2013

A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

A low power register file with asynchronously controlled read-isolation and software-directed write-discarding.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Time-Division-Multiplexer based routing algorithm for NoC system.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Implementation and optimization of 3780-point FFT on multi-core system.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

H.264 video parallel decoder on a 24-core processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

A 2D mesh NoC with self-configurable and shared-FIFOs routers.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

A turbo decoder implementation for LTE downlink mapped on a multi-core processor platform.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

Efficient implementation of 3780-point FFT on a 16-core processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

A fast multi-core virtual platform and its application on software development.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

A hybrid router combining circuit switching and packet switching with virtual channels for on-chip networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012

A Fully Programmable Reed-Solomon Decoder on a Multi-Core Processor Platform.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2012

Efficient Implementation of OFDM Inner Receiver on a Programmable Multi-Core Processor Platform.

[BibT_eX]

[DOI]

IEICE Trans. Commun., 2012

Design of a high information-density multiple valued 2-read 1-write register file.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2012

A 64×32bit 4-read 2-write low power and area efficient register file in 65nm CMOS.

[BibT_eX]

[DOI]

IEICE Electron. Express, 2012

An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

A low-cost architecture for multi-mode Reed-Solomon decoder.

[BibT_eX]

[DOI]

Proceedings of the International SoC Design Conference, 2012

Task-binding based branch-and-bound algorithm for NoC mapping.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

A pure software ldpc decoder on a multi-core processor platform with reduced inter-processor communication cost.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Evaluating performance of manycore processors with various granularities considering yield and lifetime reliability.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

2011

A Scalable and Reconfigurable Fault-Tolerant Distributed Routing Algorithm for NoCs.

[BibT_eX]

[DOI]

Zewen Shi

Xiaoyang Zeng

IEICE Trans. Inf. Syst., 2011

Fault tolerant computing for stream DSP applications using GALS multi-core processors.

[BibT_eX]

[DOI]

Zewen Shi

Xiaoyang Zeng

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

A reconfigurable and deadlock-free routing algorithm for 2D Mesh Network-on-Chip.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

An optimized mapping algorithm based on Simulated Annealing for regular NoC architecture.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A low power 1.0 GHz VCO in 65nm-CMOS LP-process.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Modified Minimal-Connected-Component fault block model to deal with defective links and nodes for 2D-mesh NoCs.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Design of a single-ended cell based 65nm 32×32b 4R2W register file.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A method of quadratic programming for mapping on NoC architecture.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A channel estimator for LTE downlink mapped on a multi-core processor platform.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A control scheme for a 65nm 32×32b 4-read 2-write register file.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010

A Low-Area Multi-Link Interconnect Architecture for GALS Chip Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2010

A Cost-Efficient LDPC Decoder for DVB-S2 with the Solution to Address Conflict Issue.

[BibT_eX]

[DOI]

IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010

A scalable and fault-tolerant routing algorithm for NoCs.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

2009

High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2009

A 167-Processor Computational Platform in 65 nm CMOS.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2009

2008

Architecture and Evaluation of an Asynchronous Array of Simple Processors.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2008

AsAP: An Asynchronous Array of Simple Processors.

[BibT_eX]

[DOI]

IEEE J. Solid State Circuits, 2008

A low-area interconnect architecture for chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

2007

A Scalable Dual-Clock FIFO for Data Transfers Between Arbitrary and Haltable Clock Domains.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2007

AsAP: A Fine-Grained Many-Core Platform for DSP Applications.

[BibT_eX]

[DOI]

IEEE Micro, 2007

A Shared Memory Module for Asynchronous Arrays of Processors.

[BibT_eX]

[DOI]

Michael J. Meeuwsen

EURASIP J. Embed. Syst., 2007

2006

Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multi-Processor Systems.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

An asynchronous array of simple processors for dsp applications.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Solid State Circuits Conference, 2006

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles.

[BibT_eX]

[DOI]