Zhiyi Yu

Orcid: 0000-0002-8802-0457

According to our database1, Zhiyi Yu authored at least 132 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Toward Efficient Asynchronous Circuits Design Flow Using Backward Delay Propagation Constraint.
IEEE Trans. Very Large Scale Integr. Syst., October, 2024

SC-PLR: An Approximate Spiking Neural Network Accelerator With On-Chip Predictive Learning Rule.
IEEE Trans. Biomed. Circuits Syst., October, 2024

LAC: A Novel Lightweight Asynchronous Controller With Optimized Phase Shift.
IEEE Trans. Circuits Syst. II Express Briefs, August, 2024

Low-Power, High-Speed, and Area-Efficient Multiplier Based on the PTL Logic Style.
IEEE Trans. Circuits Syst. II Express Briefs, July, 2024

Better-Than-Worst-Case: A Frequency Adaptation Asynchronous RISC-V Core With Vector Extension.
IEEE Trans. Very Large Scale Integr. Syst., June, 2024

Contrastive learning for unsupervised sentence embeddings using negative samples with diminished semantics.
J. Supercomput., March, 2024

Enhancing text classification with attention matrices based on BERT.
Expert Syst. J. Knowl. Eng., March, 2024

Enhancing aspect-based sentiment analysis with dependency-attention GCN and mutual assistance mechanism.
J. Intell. Inf. Syst., February, 2024

An Adaptive Re-Read Strategy for Mitigating Temporary Read Errors in 3D-NAND Flash Solid-State Drives.
Proceedings of the 13th Non-Volatile Memory Systems and Applications Symposium, 2024

Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

DAW-DMR: Divergence-Aware Warped DMR with Full Error Detection for GPGPU s.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2024

BafSP: Co-Design of Compute SRAM and Bit-Aware Data Flip Mitigation with In-Memory Sparsity Detection for SpMM.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2024

Sparsespikformer: A Co-Design Framework for Token and Weight Pruning in Spiking Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2024

An Efficient Asynchronous Circuits Design Flow with Backward Delay Propagation Constraint.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

2023
TensorCache: Reconstructing Memory Architecture With SRAM-Based In-Cache Computing for Efficient Tensor Computations in GPGPUs.
IEEE Trans. Very Large Scale Integr. Syst., December, 2023

A High-Density and Reconfigurable SRAM-Based Digital Compute-In-Memory Macro for Low-Power AI Chips.
IEEE Trans. Circuits Syst. II Express Briefs, September, 2023

Parallel-Prefix Adder in Spin-Orbit Torque Magnetic RAM for High Bit-Width Non-Volatile Computation.
IEEE Trans. Circuits Syst. II Express Briefs, February, 2023

High-Reliability, Reconfigurable, and Fully Non-volatile Full-Adder Based on SOT-MTJ for Image Processing Applications.
IEEE Trans. Circuits Syst. II Express Briefs, February, 2023

Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUs.
Microelectron. J., 2023

Computing Resistance-Style Image Sensors for Artificial Neural Networks.
IEEE Internet Things J., 2023

A Low Power 100-Gb/s PAM-4 Driver with Linear Distortion Compensation in 65-nm CMOS.
IEICE Trans. Electron., 2023

A Low Insertion Loss Wideband Bonding-Wire Based Interconnection for 400 Gbps PAM4 Transceivers.
IEICE Trans. Electron., 2023

A 6T-3M SOT-MRAM for in-memory computing with reconfigurable arithmetic operations.
IEICE Electron. Express, 2023

Enabling energy-Efficient object detection with surrogate gradient descent in spiking neural networks.
CoRR, 2023

A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2023

A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

Towards Energy-Efficient Asynchronous Circuit Design with Flip-Flop-to-Latch Replacement.
Proceedings of the IEEE International Conference on Integrated Circuits, 2023

Towards Efficient On-Chip Learning for Spiking Neural Networks Accelerator with Surrogate Gradient.
Proceedings of the IEEE International Conference on Integrated Circuits, 2023

A Scalable Deadlock-Free Static Routing Algorithm for Chiplet-Based Systems.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

LWSDP: Locality-Aware Warp Scheduling and Dynamic Data Prefetching Co-design in the Per-SM Private Cache of GPGPUs.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

CRAFT: Common Router Architecture for Throughput Optimization.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2023

2022
An Asynchronous Bundled-Data Template With Current Sensing Completion Detection Technique.
IEEE Trans. Circuits Syst. II Express Briefs, 2022

A dual-rail/single-rail hybrid system using null convention logic circuits.
Microelectron. J., 2022

An in-memory computing multiply-and-accumulate circuit based on ternary STT-MRAMs for convolutional neural networks.
IEICE Electron. Express, 2022

Hardware Based RISC-V Instruction Set Randomization.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

DFT Architecture for Click-Based Bundled-Data Asynchronous Circuits.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

3D-NWA: A Nested-Winograd Accelerator for 3D CNNs.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

One Case of THOUGHT: Industry-University Converged Education Practice on Open Source.
Proceedings of the Computer Science and Education - 17th International Conference, 2022

2021
Balancing the Cost and Performance Trade-Offs in SNN Processors.
IEEE Trans. Circuits Syst. II Express Briefs, 2021

A Low-Power Asynchronous RISC-V Processor With Propagated Timing Constraints Method.
IEEE Trans. Circuits Syst. II Express Briefs, 2021

An MTJ-Based Asynchronous System With Extremely Fine-Grained Voltage Scaling.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021

A Data-Driven Asynchronous Neural Network Accelerator.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Hand gesture recognition algorithm combining hand-type adaptive algorithm and effective-area ratio for efficient edge computing.
J. Electronic Imaging, 2021

High-parallelism Inception-like Spiking Neural Networks for Unsupervised Feature Learning.
Neurocomputing, 2021

Dimension fusion: Dimension-level dynamically composable accelerator for convolutional neural networks.
IEICE Electron. Express, 2021

OERFF: A Vehicle Re-Identification Method Based on Orientation Estimation and Regional Feature Fusion.
IEEE Access, 2021

High-Throughput Zipper Encoder for 800G Optical Communication System.
Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

A high throughput spatially coupled low density generator matrix coding system.
Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

FWUA : A Flexible Winograd-Based Uniform Accelerator for 1D/2D/3D CNNs.
Proceedings of the 2021 IEEE International Conference on Integrated Circuits, 2021

3D-VNPU: A Flexible Accelerator for 2D/3D CNNs on FPGA.
Proceedings of the 29th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2021

2020
NeuronLink: An Efficient Chip-to-Chip Interconnect for Large-Scale Neural Network Accelerators.
IEEE Trans. Very Large Scale Integr. Syst., 2020

Low-Cost Adaptive Exponential Integrate-and-Fire Neuron Using Stochastic Computing.
IEEE Trans. Biomed. Circuits Syst., 2020

DM-IMCA: A dual-mode in-memory computing architecture for general purpose processing.
IEICE Electron. Express, 2020

BioSNet: A Fast-Learning and High-Robustness Unsupervised Biomimetic Spiking Neural Network.
CoRR, 2020

A Low-Cost and High-Throughput NoC-Aware Chip-to-Chip Interconnection.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2020

Spiking Inception Module for Multi-layer Unsupervised Spiking Neural Networks.
Proceedings of the 2020 International Joint Conference on Neural Networks, 2020

A Low-Power Processing Element Based on Asynchronous Data-Driven Bit-Serial Multiplier for CNNs.
Proceedings of the 2020 IEEE International Conference on Integrated Circuits, 2020

An Asynchronous Convolution Process Engine forVGG-16 Neural Network.
Proceedings of the 2020 IEEE International Conference on Integrated Circuits, 2020

SPA: Stochastic Probability Adjustment for System Balance of Unsupervised SNNs.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

2019
A 68-mw 2.2 Tops/w Low Bit Width and Multiplierless DCNN Object Detection Processor for Visually Impaired People.
IEEE Trans. Circuits Syst. Video Technol., 2019

2018
A Flexible and Energy-Efficient Convolutional Neural Network Acceleration With Dedicated ISA and Accelerator.
IEEE Trans. Very Large Scale Integr. Syst., 2018

An Automatic Task Partition Method for Multi-core System.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

A Reconfigurable Process Engine for Flexible Convolutional Neural Network Acceleration.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
An FPGA-Based Hardware Accelerator for Traffic Sign Detection.
IEEE Trans. Very Large Scale Integr. Syst., 2017

A Scalable Network-on-Chip Microprocessor With 2.5D Integrated Memory and Accelerator.
IEEE Trans. Circuits Syst. I Regul. Pap., 2017

A multi-core-based heterogeneous parallel turbo decoder.
IEICE Electron. Express, 2017

Parallel implementations of SHA-3 on a 24-core processor with software and hardware co-design.
Proceedings of the 12th IEEE International Conference on ASIC, 2017

The write deduplication mechanism based on a novel low-power data latched sense amplifier for a magnetic tunnel junction based non-volatile memory.
Proceedings of the 12th IEEE International Conference on ASIC, 2017

A fast and energy efficient FPGA-based system for real-time object tracking.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2015
Design and Analysis of Highly Energy/Area-Efficient Multiported Register Files With Read Word-Line Sharing Strategy in 65-nm CMOS Process.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Many-Core Processors Granularity Evaluation by Considering Performance, Yield, and Lifetime Reliability.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A 65 nm Cryptographic Processor for High Speed Pairing Computation.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A Heterogeneous Multicore Crypto-Processor With Flexible Long-Word-Length Computation.
IEEE Trans. Circuits Syst. I Regul. Pap., 2015

Special Issue on Emerging Many-Core Systems for Exascale Computing.
ACM J. Emerg. Technol. Comput. Syst., 2015

Non-binary digital calibration for split-capacitor DAC in SAR ADC.
IEICE Electron. Express, 2015

A scalable and reconfigurable 2.5D integrated multicore processor on silicon interposer.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

A configurable SoC design for information security.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

Parallel implementation of AES on 2.5D multicore platform with hardware and software co-design.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

A lifting-based 2-D discrete wavelet transform architecture for data compression of bio-potential signals.
Proceedings of the 2015 IEEE 11th International Conference on ASIC, 2015

2014
An Efficient Implementation of Montgomery Multiplication on Multicore Platform With Optimized Algorithm, Task Partitioning, and Network Architecture.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Low-Power Multicore Processor Design With Reconfigurable Same-Instruction Multiple Process.
IEEE Trans. Circuits Syst. II Express Briefs, 2014

A 16-Core Processor With Shared-Memory and Message-Passing Communications.
IEEE Trans. Circuits Syst. I Regul. Pap., 2014

An area-efficient dual replica-bitline delay technique for process-variation-tolerant low voltage SRAM sense amplifier timing.
IEICE Electron. Express, 2014

Acceleration of Naive-Bayes algorithm on multicore processor for massive text classification.
Proceedings of the 2014 International Symposium on Integrated Circuits (ISIC), 2014

2013
Parallelization of Radix-2 Montgomery Multiplication on Multicore Platform.
IEEE Trans. Very Large Scale Integr. Syst., 2013

Architecture and Physical Implementation of Reconfigurable Multi-Port Physical Unclonable Functions in 65 nm CMOS.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Efficient distributed memory management in a multi-core H.264 decoder on FPGA.
Proceedings of the 2013 International Symposium on System on Chip, 2013

A 65nm 39GOPS/W 24-core processor with 11Tb/s/W packet-controlled circuit-switched double-layer network-on-chip and heterogeneous execution array.
Proceedings of the 2013 IEEE International Solid-State Circuits Conference, 2013

A low power register file with asynchronously controlled read-isolation and software-directed write-discarding.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Time-Division-Multiplexer based routing algorithm for NoC system.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Implementation and optimization of 3780-point FFT on multi-core system.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

H.264 video parallel decoder on a 24-core processor.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

A 2D mesh NoC with self-configurable and shared-FIFOs routers.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

A turbo decoder implementation for LTE downlink mapped on a multi-core processor platform.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

Efficient implementation of 3780-point FFT on a 16-core processor.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

A fast multi-core virtual platform and its application on software development.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

A hybrid router combining circuit switching and packet switching with virtual channels for on-chip networks.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012
A Fully Programmable Reed-Solomon Decoder on a Multi-Core Processor Platform.
IEICE Trans. Inf. Syst., 2012

Efficient Implementation of OFDM Inner Receiver on a Programmable Multi-Core Processor Platform.
IEICE Trans. Commun., 2012

Design of a high information-density multiple valued 2-read 1-write register file.
IEICE Electron. Express, 2012

A 64×32bit 4-read 2-write low power and area efficient register file in 65nm CMOS.
IEICE Electron. Express, 2012

An 800MHz 320mW 16-core processor with message-passing and shared-memory inter-core communication mechanisms.
Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

A low-cost architecture for multi-mode Reed-Solomon decoder.
Proceedings of the International SoC Design Conference, 2012

Task-binding based branch-and-bound algorithm for NoC mapping.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

A pure software ldpc decoder on a multi-core processor platform with reduced inter-processor communication cost.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Evaluating performance of manycore processors with various granularities considering yield and lifetime reliability.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

2011
A Scalable and Reconfigurable Fault-Tolerant Distributed Routing Algorithm for NoCs.
IEICE Trans. Inf. Syst., 2011

Fault tolerant computing for stream DSP applications using GALS multi-core processors.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

A reconfigurable and deadlock-free routing algorithm for 2D Mesh Network-on-Chip.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2011), 2011

An optimized mapping algorithm based on Simulated Annealing for regular NoC architecture.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A low power 1.0 GHz VCO in 65nm-CMOS LP-process.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Modified Minimal-Connected-Component fault block model to deal with defective links and nodes for 2D-mesh NoCs.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Design of a single-ended cell based 65nm 32×32b 4R2W register file.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A method of quadratic programming for mapping on NoC architecture.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A channel estimator for LTE downlink mapped on a multi-core processor platform.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

A control scheme for a 65nm 32×32b 4-read 2-write register file.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010
A Low-Area Multi-Link Interconnect Architecture for GALS Chip Multiprocessors.
IEEE Trans. Very Large Scale Integr. Syst., 2010

A Cost-Efficient LDPC Decoder for DVB-S2 with the Solution to Address Conflict Issue.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2010

A scalable and fault-tolerant routing algorithm for NoCs.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

2009
High Performance, Energy Efficiency, and Scalability With GALS Chip Multiprocessors.
IEEE Trans. Very Large Scale Integr. Syst., 2009

A 167-Processor Computational Platform in 65 nm CMOS.
IEEE J. Solid State Circuits, 2009

2008
Architecture and Evaluation of an Asynchronous Array of Simple Processors.
J. Signal Process. Syst., 2008

AsAP: An Asynchronous Array of Simple Processors.
IEEE J. Solid State Circuits, 2008

A low-area interconnect architecture for chip multiprocessors.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

2007
A Scalable Dual-Clock FIFO for Data Transfers Between Arbitrary and Haltable Clock Domains.
IEEE Trans. Very Large Scale Integr. Syst., 2007

AsAP: A Fine-Grained Many-Core Platform for DSP Applications.
IEEE Micro, 2007

A Shared Memory Module for Asynchronous Arrays of Processors.
EURASIP J. Embed. Syst., 2007

2006
Performance and Power Analysis of Globally Asynchronous Locally Synchronous Multi-Processor Systems.
Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

An asynchronous array of simple processors for dsp applications.
Proceedings of the 2006 IEEE International Solid State Circuits Conference, 2006

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles.
Proceedings of the 24th International Conference on Computer Design (ICCD 2006), 2006

Hardware and applications of AsAP: An asynchronous array of simple processors.
Proceedings of the 2006 IEEE Hot Chips 18 Symposium (HCS), 2006

1998
Optical measurement system for characterizing compound semiconductor interface and surface states.
IEEE Trans. Instrum. Meas., 1998


  Loading...