Leibo Liu

Orcid: 0000-0001-7548-4116

According to our database1, Leibo Liu authored at least 357 papers between 2002 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
DNI-MDCAP: improvement of causal MiRNA-disease association prediction based on deep network imputation.
BMC Bioinform., December, 2024

Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024

Learning the Error Features of Approximate Multipliers for Neural Network Applications.
IEEE Trans. Computers, March, 2024

A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm.
IEEE Trans. Parallel Distributed Syst., February, 2024

Hardware-Efficient Logarithmic Floating-Point Multipliers for Error-Tolerant Applications.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2024

MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024

Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3 Advancing from the 4x Area Era to the 3x Area Era.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

A Low-Latency High-Order Arithmetic to Boolean Masking Conversion.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding.
CoRR, 2024

CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency.
Sci. China Inf. Sci., 2024

16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2024

QUQ: Quadruplet Uniform Quantization for Efficient Vision Transformer Inference.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering.
IEEE Trans. Parallel Distributed Syst., December, 2023

Automated ICD coding using extreme multi-label long text transformer-based models.
Artif. Intell. Medicine, October, 2023

M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2023

Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators.
J. Comput. Sci. Technol., April, 2023

Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023

TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization.
IEEE J. Solid State Circuits, March, 2023

SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023

Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records.
PhD thesis, 2023

A Closer Look at the Chaotic Ring Oscillators based TRNG Design.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2023

STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems.
IEEE J. Solid State Circuits, 2023

An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.
IEEE J. Solid State Circuits, 2023

ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow.
CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023

A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

Shogun: A Task Scheduling Framework for Graph Mining Accelerators.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

A Low-Randomness First-Order Masked Xoodyak.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A 10TFLOPS Datacenter-Oriented GPU with 4-Corner Stacked 64GB Memory by The Means of 2.5D Packaging Technology.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

Welcome Message.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022
A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

BR-CIM: An Efficient Binary Representation Computation-In-Memory Design.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Characterizing Approximate Adders and Multipliers for Mitigating Aging and Temperature Degradations.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Security Oriented Design Framework for EM Side-Channel Protection in RTL Implementations.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.
IEEE J. Solid State Circuits, 2022

A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
IEEE J. Solid State Circuits, 2022

De-identifying Australian hospital discharge summaries: An end-to-end framework using ensemble of deep learning models.
J. Biomed. Informatics, 2022

Hierarchical label-wise attention transformer model for explainable ICD coding.
J. Biomed. Informatics, 2022

Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece.
IACR Cryptol. ePrint Arch., 2022

HQNAS: Auto CNN deployment framework for joint quantization and architecture search.
CoRR, 2022

FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.
CoRR, 2022

An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance.
Sci. China Inf. Sci., 2022

A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A SHA-512 Hardware Implementation Based on Block RAM Storage Structure.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Mixed-granularity parallel coarse-grained reconfigurable architecture.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Efficient access scheme for multi-bank based NTT architecture through conflict graph.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Software Defined Chips - Volume I, 2
Springer, ISBN: 978-981-19-6993-5, 2022

Majority Logic-Based Approximate Multipliers for Error-Tolerant Applications.
Proceedings of the Approximate Computing, 2022

2021
An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2021

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment.
IEEE Trans. Multim., 2021

LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021

Non-Volatile Approximate Arithmetic Circuits Using Scalable Hybrid Spin-CMOS Majority Gates.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021

Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Security-Driven Placement and Routing Tools for Electromagnetic Side-Channel Protection.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

On-Chip Trust Evaluation Utilizing TDC-Based Parameter-Adjustable Security Primitive.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters.
IEEE J. Solid State Circuits, 2021

TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
IEEE J. Solid State Circuits, 2021

Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".
IEEE J. Solid State Circuits, 2021

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.
IEEE J. Solid State Circuits, 2021

De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers.
CoRR, 2021

Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers.
Sci. China Inf. Sci., 2021

A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

A Logarithmic Floating-Point Multiplier for the Efficient Training of Neural Networks.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
Near-Optimal MIMO-SCMA Uplink Detection With Low-Complexity Expectation Propagation.
IEEE Trans. Wirel. Commun., 2020

Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems.
IEEE Trans. Signal Process., 2020

Achieving Flexible Global Reconfiguration in NoCs Using Reconfigurable Rings.
IEEE Trans. Parallel Distributed Syst., 2020

Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation.
IEEE Trans. Parallel Distributed Syst., 2020

A Multi-Task Hardwired Accelerator for Face Detection and Alignment.
IEEE Trans. Circuits Syst. Video Technol., 2020

Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020

A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power.
IEEE Trans. Circuits Syst. II Express Briefs, 2020

Efficient Scheduling of Irregular Network Structures on CNN Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Aggressive Fine-Grained Power Gating of NoC Buffers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Dynamic Frequency Scaling Aware Opportunistic Through-Silicon-Via Inductor Utilization in Resonant Clocking.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication.
IEEE Trans. Computers, 2020

Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications.
Proc. IEEE, 2020

A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection.
IEEE J. Solid State Circuits, 2020

A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm.
IACR Cryptol. ePrint Arch., 2020

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications.
ACM Comput. Surv., 2020

TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

A Reconfigurable Branch Predictor for Spatial Computing Architectures.
Proceedings of the ICDSP 2020: 4th International Conference on Digital Signal Processing, 2020

PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

CDRing: Reconfigurable Ring Architecture by Exploiting Cycle Decomposition of Torus Topology.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2020

2019
An Energy-Efficient and Noise-Tolerant Recurrent Neural Network Using Stochastic Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2019

Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.
IEEE Trans. Parallel Distributed Syst., 2019

Face Alignment With Expression- and Pose-Based Adaptive Initialization.
IEEE Trans. Multim., 2019

Reconfigurable Architecture for Neural Approximation in Multimedia Computing.
IEEE Trans. Circuits Syst. Video Technol., 2019

A Face Alignment Accelerator Based on Optimized Coarse-to-Fine Shape Searching.
IEEE Trans. Circuits Syst. Video Technol., 2019

An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019

A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression.
IEEE Trans. Circuits Syst. II Express Briefs, 2019

A High-Performance and Energy-Efficient FIR Adaptive Filter Using Approximate Distributed Arithmetic Circuits.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019

A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Low-Power Unsigned Divider and Square Root Circuit Designs Using Adaptive Approximation.
IEEE Trans. Computers, 2019

LnCompare: gene set feature analysis for human long non-coding RNAs.
Nucleic Acids Res., 2019

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width.
IEEE J. Solid State Circuits, 2019

Optimal design of a low-power, phase-switching modulator for implantable medical applications.
Integr., 2019

A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions.
Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019

An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2019

FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Pj-AxMTJ: Process-in-memory with Joint Magnetization Switching for Approximate Computing in Magnetic Tunnel Junction.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

A Reliable Physical Unclonable Function Based on Differential Charging Capacitors.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms.
Proceedings of the International Conference on Computer-Aided Design, 2019

Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

Characterizing Approximate Adders and Multipliers Optimized under Different Design Constraints.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019

Constructing Concurrent Data Structures on FPGA with Channels.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable Architectures.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

L-MPC: A LUT based Multi-Level Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Small-Footprint Keyword Spotting with Graph Convolutional Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Massive MIMO Detection Algorithm and VLSI Architecture
Springer, ISBN: 978-981-13-6361-0, 2019

Approximate Arithmetic Circuits: Design and Evaluation.
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019

2018
Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems.
IEEE Trans. Signal Process., 2018

Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2018

Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration.
IEEE Trans. Parallel Distributed Syst., 2018

A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018

A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018

HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.
IEEE J. Solid State Circuits, 2018

FP-BNN: Binarized neural network on FPGA.
Neurocomputing, 2018

Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution.
IEEE Comput. Archit. Lett., 2018

Stochastic Analysis of Multiplex Boolean Networks for Understanding Epidemic Propagation.
IEEE Access, 2018

Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis.
IEEE Access, 2018

A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Bit-width Adaptive Accelerator Design for Convolution Neural Network.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Adaptive approximation in arithmetic circuits: A low-power unsigned divider design.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018

An efficient kernel transformation architecture for binary- and ternary-weight neural network inference.
Proceedings of the 55th Annual Design Automation Conference, 2018

LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.
Proceedings of the 55th Annual Design Automation Conference, 2018

A Full Multicast Reconfigurable Non-blocking Permutation Network.
Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2018

A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

2017
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture.
IEEE Trans. Signal Process., 2017

Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory.
IEEE Trans. Parallel Distributed Syst., 2017

CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2017

A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks.
IEEE Trans. Inf. Forensics Secur., 2017

PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC.
IEEE Trans. Circuits Syst. II Express Briefs, 2017

An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation.
IEEE Syst. J., 2017

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits.
ACM J. Emerg. Technol. Comput. Syst., 2017

Implementation of in-loop filter for HEVC decoder on reconfigurable processor.
IET Image Process., 2017

Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion.
IEEE Access, 2017

Multi-CNN and decision tree based driving behavior evaluation.
Proceedings of the Symposium on Applied Computing, 2017

AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs.
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Memory fartitioning-based modulo scheduling for high-level synthesis.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment.
Proceedings of the 54th Annual Design Automation Conference, 2017

Minimizing Pipeline Stalls in Distributed-Controlled Coarse-Grained Reconfigurable Arrays with Triggered Instruction Issue and Execution.
Proceedings of the 54th Annual Design Automation Conference, 2017

An efficient hardware design for cerebellar models using approximate circuits: special session paper.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017

Energy-aware loops mapping on multi-vdd CGRAs without performance degradation.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
Trigger-Centric Loop Mapping on CGRAs.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2016

CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache.
IEEE Trans. Very Large Scale Integr. Syst., 2016

A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Reliability Evaluation of Phased-Mission Systems Using Stochastic Computation.
IEEE Trans. Reliab., 2016

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2016

TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions.
IEEE Trans. Parallel Distributed Syst., 2016

Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture.
IEEE Trans. Inf. Forensics Secur., 2016

A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator.
IEEE Trans. Circuits Syst. Video Technol., 2016

A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation.
IEEE Trans. Circuits Syst. II Express Briefs, 2016

Joint Modulo Scheduling and V<sub>dd</sub> Assignment for Loop Mapping on Dual- V<sub>dd</sub> CGRAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.
IEICE Trans. Inf. Syst., 2016

Dynamically reconfigurable architecture for symmetric ciphers.
Sci. China Inf. Sci., 2016

A fast face detection architecture for auto-focus in smart-phones and digital cameras.
Sci. China Inf. Sci., 2016

A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.
IEEE Comput. Archit. Lett., 2016

Energy management on DVS based coarse-grained reconfigurable platform.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016

Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Multibank memory optimization for parallel data access in multiple data arrays.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Energy Management on Battery-Powered Coarse-Grained Reconfigurable Platforms.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A Fault-Tolerant Technique Using Quadded Logic and Quadded Transistors.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm.
ACM Trans. Reconfigurable Technol. Syst., 2015

A Stochastic Approach for the Analysis of Dynamic Fault Trees With Spare Gates Under Probabilistic Common Cause Failures.
IEEE Trans. Reliab., 2015

Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding".
IEEE Trans. Multim., 2015

An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding.
IEEE Trans. Multim., 2015

A real-time time-consistent 2D-to-3D video conversion system using color histogram.
IEEE Trans. Consumer Electron., 2015

A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency.
IEEE Trans. Circuits Syst. II Express Briefs, 2015

An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature.
Sensors, 2015

A Novel 2D-to-3D Video Conversion Method Using Time-Coherent Depth Maps.
Sensors, 2015

High-Performance Motion Estimation for Image Sensors with Video Compression.
Sensors, 2015

A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.
Sensors, 2015

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.
J. Circuits Syst. Comput., 2015

Low-Power Loop Parallelization onto CGRA Utilizing Variable Dual V<sub>DD</sub>.
IEICE Trans. Inf. Syst., 2015

The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2015

Battery-Aware Loop Nests Mapping for CGRAs.
IEICE Trans. Inf. Syst., 2015

Mapping Multi-Level Loop Nests onto CGRAs Using Polyhedral Optimizations.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015

Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints.
Sci. China Inf. Sci., 2015

A Multi-modal 2D + 3D Face Recognition Method with a Novel Local Feature Descriptor.
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

A flexible and energy-efficient reconfigurable architecture for symmetric cipher processing.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Neural approximating architecture targeting multiple application domains.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Real-time time-consistent 2D-to-3D video conversion based on color histogram.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Efficient lane detection system based on monocular camera.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

An automatic depth map generation method by image classification.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Acceleration of Nested Conditionals on CGRAs via Trigger Scheme.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

REPROC: A Dynamically Reconfigurable Architecture for Symmetric Cryptography (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

RNA: a reconfigurable architecture for hardware neural acceleration.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Acceleration of control flows on reconfigurable architecture with a composite method.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Efficient memory partitioning for parallel data access in multidimensional arrays.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

Battery-aware mapping optimization of loop nests for CGRAs.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time.
IEEE Trans. Very Large Scale Integr. Syst., 2014

SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration.
IEEE Trans. Very Large Scale Integr. Syst., 2014

A Stochastic Approach for the Analysis of Fault Trees With Priority AND Gates.
IEEE Trans. Reliab., 2014

An uneven-dual-core processor based mobile platform for facilitating the collaboration among various embedded electronic devices.
IEEE Trans. Consumer Electron., 2014

A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps.
Sensors, 2014

MapReduce inspired loop mapping for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014

Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014

Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system.
Sci. China Inf. Sci., 2014

Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2014

Optimization of speeded-up robust feature algorithm for hardware implementation.
Sci. China Inf. Sci., 2014

A fast and robust traffic sign recognition method using ring of RIBP histograms based feature.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD.
Proceedings of the IEEE 57th International Midwest Symposium on Circuits and Systems, 2014

A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

A parallel hardware architecture for fast integral image computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Map-reduce inspired loop parallelization on CGRA.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

A FAST Extreme Illumination Robust Feature in Affine Space.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure.
Proceedings of the IEEE Frontiers in Education Conference, 2014

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Low-Power Reconfigurable Processor Utilizing Variable Dual V<sub>DD</sub>.
IEEE Trans. Circuits Syst. II Express Briefs, 2013

A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration.
J. Syst. Archit., 2013

Calibration Techniques for Low-Power Wireless Multiband Transceiver.
Int. J. Distributed Sens. Networks, 2013

Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

An Inductive-Coupling Interconnected Application-Specific 3D NoC Design.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Battery-Aware Task Mapping for Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2013

Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs.
IEICE Trans. Inf. Syst., 2013

The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip.
IEICE Trans. Inf. Syst., 2013

A high-throughput fixed-point complex divider for FPGAs.
IEICE Electron. Express, 2013

An efficient VLSI architecture of speeded-up robust feature extraction for high resolution and high frame rate video.
Sci. China Inf. Sci., 2013

Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2013

ReSSIM: a mixed-level simulator for dynamic coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2013

SPC: An Approach to Guarantee Performance in Cost Oriented Mapping Algorithm for NoC Architectures.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

Battery-Aware MAC Analytical Modeling for Extending Lifetime of Low Duty-Cycled Wireless Sensor Network.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Affine transformations for communication and reconfiguration optimization of loops on CGRAs.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding.
Proceedings of the Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2013

Polyhedral model based mapping optimization of loop nests for CGRAs.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

SURFEX: A 57fps 1080P resolution 220mW silicon implementation for simplified speeded-up robust feature with 65nm process.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

2012
Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2012

Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC.
IEICE Trans. Electron., 2012

Multi-Battery Scheduling for Battery-Powered DVS Systems.
IEICE Trans. Commun., 2012

Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture.
IEICE Trans. Inf. Syst., 2012

Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications.
IEICE Trans. Inf. Syst., 2012

Reducing configuration contexts for coarse-grained reconfigurable architecture.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

2011
A high efficient baseband transceiver for IEEE 802.15.4 LR-WPAN systems.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Performance evaluation modeling for reconfigurable processor.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010
An Implementation of Fast-Locking and Wide-Range 11-bit Reversible SAR DLL.
IEEE Trans. Circuits Syst. II Express Briefs, 2010

A Cycle-Accurate Simulator for a Reconfigurable Multi-Media System.
IEICE Trans. Inf. Syst., 2010

CropNET: A Wireless Multimedia Sensor Network for Agricultural Monitoring.
IEICE Trans. Commun., 2010

Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System.
IEICE Trans. Inf. Syst., 2010

User Behavior Pattern Analysis and Prediction Based on Mobile Phone Sensors.
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010

A reconfigurable multi-processor SoC for media applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

A VLSI design of sensor node for wireless image sensor network.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Parallel implementation of computing-intensive decoding algorithms of H.264 on reconfigurable SoC.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Battery aware tasks allocating algorithm for multi-battery operated system.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

Mixed-level modeling for network on chip infrastructure in SoC design.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

2009
Analog circuit optimization system based on hybrid evolutionary algorithms.
Integr., 2009

Compiler Framework for Reconfigurable Computing Architecture.
IEICE Trans. Electron., 2009

Buffer planning for application-specific networks-on-chip design.
Sci. China Ser. F Inf. Sci., 2009

A Fast-locking and Wide-range Reversible SAR DLL.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009

2007
Battery-Aware Variable Voltage Scheduling on Real-Time Multiprocessor Platforms.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

2006
An ASIC Implementation of Lifting-Based 2-D Discrete Wavelet Transform.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006

2005
An ASIC implementation of JPEG2000 codec.
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005

2004
A VLSI architecture of JPEG2000 encoder.
IEEE J. Solid State Circuits, 2004

2002
A VLSI architecture of spatial combinative lifting algorithm based 2-D DWT/IDWT.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2002, 2002


  Loading...