Leibo Liu
Orcid: 0000-0001-7548-4116
According to our database1,
Leibo Liu
authored at least 357 papers
between 2002 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
DNI-MDCAP: improvement of causal MiRNA-disease association prediction based on deep network imputation.
BMC Bioinform., December, 2024
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024
CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024
Learning the Error Features of Approximate Multipliers for Neural Network Applications.
IEEE Trans. Computers, March, 2024
A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm.
IEEE Trans. Parallel Distributed Syst., February, 2024
Hardware-Efficient Logarithmic Floating-Point Multipliers for Error-Tolerant Applications.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2024
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024
Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3 Advancing from the 4x Area Era to the 3x Area Era.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024
UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024
Harmonising the Clinical Melody: Tuning Large Language Models for Hospital Course Summarisation in Clinical Coding.
CoRR, 2024
CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency.
Sci. China Inf. Sci., 2024
16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024
Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2024
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024
Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering.
IEEE Trans. Parallel Distributed Syst., December, 2023
Artif. Intell. Medicine, October, 2023
M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2023
Approximate Processing Element Design and Analysis for the Implementation of CNN Accelerators.
J. Comput. Sci. Technol., April, 2023
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization.
IEEE J. Solid State Circuits, March, 2023
SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023
Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records.
PhD thesis, 2023
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2023
IEEE Trans. Circuits Syst. I Regul. Pap., 2023
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023
RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems.
IEEE J. Solid State Circuits, 2023
An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.
IEEE J. Solid State Circuits, 2023
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023
CoRR, 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023
CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023
A 10TFLOPS Datacenter-Oriented GPU with 4-Corner Stacked 64GB Memory by The Means of 2.5D Packaging Technology.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023
2022
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022
CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
Characterizing Approximate Adders and Multipliers for Mitigating Aging and Temperature Degradations.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022
Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Security Oriented Design Framework for EM Side-Channel Protection in RTL Implementations.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.
IEEE J. Solid State Circuits, 2022
A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
IEEE J. Solid State Circuits, 2022
De-identifying Australian hospital discharge summaries: An end-to-end framework using ensemble of deep learning models.
J. Biomed. Informatics, 2022
J. Biomed. Informatics, 2022
Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece.
IACR Cryptol. ePrint Arch., 2022
CoRR, 2022
FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.
CoRR, 2022
An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance.
Sci. China Inf. Sci., 2022
A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Efficient access scheme for multi-bank based NTT architecture through conflict graph.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022
Proceedings of the Approximate Computing, 2022
2021
IEEE Trans. Parallel Distributed Syst., 2021
A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment.
IEEE Trans. Multim., 2021
IEEE Trans. Circuits Syst. I Regul. Pap., 2021
Non-Volatile Approximate Arithmetic Circuits Using Scalable Hybrid Spin-CMOS Majority Gates.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021
Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
Security-Driven Placement and Routing Tools for Electromagnetic Side-Channel Protection.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
On-Chip Trust Evaluation Utilizing TDC-Based Parameter-Adjustable Security Primitive.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021
Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters.
IEEE J. Solid State Circuits, 2021
TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
IEEE J. Solid State Circuits, 2021
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".
IEEE J. Solid State Circuits, 2021
Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.
IEEE J. Solid State Circuits, 2021
De-identifying Hospital Discharge Summaries: An End-to-End Framework using Ensemble of De-Identifiers.
CoRR, 2021
Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers.
Sci. China Inf. Sci., 2021
A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021
A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021
9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021
15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021
ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
A Logarithmic Floating-Point Multiplier for the Efficient Training of Neural Networks.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021
ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021
A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021
2020
IEEE Trans. Wirel. Commun., 2020
Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems.
IEEE Trans. Signal Process., 2020
IEEE Trans. Parallel Distributed Syst., 2020
Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation.
IEEE Trans. Parallel Distributed Syst., 2020
IEEE Trans. Circuits Syst. Video Technol., 2020
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020
A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power.
IEEE Trans. Circuits Syst. II Express Briefs, 2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
Dynamic Frequency Scaling Aware Opportunistic Through-Silicon-Via Inductor Utilization in Resonant Clocking.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020
NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication.
IEEE Trans. Computers, 2020
Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications.
Proc. IEEE, 2020
A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection.
IEEE J. Solid State Circuits, 2020
IACR Cryptol. ePrint Arch., 2020
A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications.
ACM Comput. Surv., 2020
TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020
Proceedings of the ICDSP 2020: 4th International Conference on Digital Signal Processing, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020
STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
CDRing: Reconfigurable Ring Architecture by Exploiting Cycle Decomposition of Torus Topology.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2020
2019
An Energy-Efficient and Noise-Tolerant Recurrent Neural Network Using Stochastic Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2019
Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.
IEEE Trans. Parallel Distributed Syst., 2019
IEEE Trans. Multim., 2019
IEEE Trans. Circuits Syst. Video Technol., 2019
IEEE Trans. Circuits Syst. Video Technol., 2019
An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019
IEEE Trans. Circuits Syst. II Express Briefs, 2019
A High-Performance and Energy-Efficient FIR Adaptive Filter Using Approximate Distributed Arithmetic Circuits.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019
A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019
Low-Power Unsigned Divider and Square Root Circuit Designs Using Adaptive Approximation.
IEEE Trans. Computers, 2019
Nucleic Acids Res., 2019
An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width.
IEEE J. Solid State Circuits, 2019
Optimal design of a low-power, phase-switching modulator for implantable medical applications.
Integr., 2019
A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019
Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019
An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2019
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019
Pj-AxMTJ: Process-in-memory with Joint Magnetization Switching for Approximate Computing in Magnetic Tunnel Junction.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019
ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms.
Proceedings of the International Conference on Computer-Aided Design, 2019
Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019
Characterizing Approximate Adders and Multipliers Optimized under Different Design Constraints.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable Architectures.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
L-MPC: A LUT based Multi-Level Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Springer, ISBN: 978-981-13-6361-0, 2019
Proceedings of the Approximate Circuits, Methodologies and CAD., 2019
2018
Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2018
Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems.
IEEE Trans. Signal Process., 2018
Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2018
IEEE Trans. Parallel Distributed Syst., 2018
A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018
A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018
HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing.
IEEE Trans. Circuits Syst. II Express Briefs, 2018
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Gradient Descent Using Stochastic Circuits for Efficient Training of Learning Machines.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.
IEEE J. Solid State Circuits, 2018
IEEE Comput. Archit. Lett., 2018
Stochastic Analysis of Multiplex Boolean Networks for Understanding Epidemic Propagation.
IEEE Access, 2018
IEEE Access, 2018
A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018
An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
An efficient kernel transformation architecture for binary- and ternary-weight neural network inference.
Proceedings of the 55th Annual Design Automation Conference, 2018
LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.
Proceedings of the 55th Annual Design Automation Conference, 2018
Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2018
A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018
2017
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.
IEEE Trans. Very Large Scale Integr. Syst., 2017
Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture.
IEEE Trans. Signal Process., 2017
Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory.
IEEE Trans. Parallel Distributed Syst., 2017
CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2017
IEEE Trans. Parallel Distributed Syst., 2017
Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks.
IEEE Trans. Inf. Forensics Secur., 2017
PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC.
IEEE Trans. Circuits Syst. II Express Briefs, 2017
An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation.
IEEE Syst. J., 2017
A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits.
ACM J. Emerg. Technol. Comput. Syst., 2017
IET Image Process., 2017
IEEE Access, 2017
Proceedings of the Symposium on Applied Computing, 2017
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017
Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017
A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment.
Proceedings of the 54th Annual Design Automation Conference, 2017
Minimizing Pipeline Stalls in Distributed-Controlled Coarse-Grained Reconfigurable Arrays with Triggered Instruction Issue and Execution.
Proceedings of the 54th Annual Design Automation Conference, 2017
An efficient hardware design for cerebellar models using approximate circuits: special session paper.
Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, 2017
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017
2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Very Large Scale Integr. Syst., 2016
IEEE Trans. Reliab., 2016
Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2016
TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions.
IEEE Trans. Parallel Distributed Syst., 2016
Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture.
IEEE Trans. Inf. Forensics Secur., 2016
A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator.
IEEE Trans. Circuits Syst. Video Technol., 2016
IEEE Trans. Circuits Syst. II Express Briefs, 2016
Joint Modulo Scheduling and V<sub>dd</sub> Assignment for Loop Mapping on Dual- V<sub>dd</sub> CGRAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016
An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.
IEICE Trans. Inf. Syst., 2016
Sci. China Inf. Sci., 2016
A fast face detection architecture for auto-focus in smart-phones and digital cameras.
Sci. China Inf. Sci., 2016
A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.
IEEE Comput. Archit. Lett., 2016
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016
Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays.
Proceedings of the 53rd Annual Design Automation Conference, 2016
Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels.
IEEE Trans. Very Large Scale Integr. Syst., 2015
IEEE Trans. Very Large Scale Integr. Syst., 2015
Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015
A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015
IEEE Trans. Very Large Scale Integr. Syst., 2015
ACM Trans. Reconfigurable Technol. Syst., 2015
A Stochastic Approach for the Analysis of Dynamic Fault Trees With Spare Gates Under Probabilistic Common Cause Failures.
IEEE Trans. Reliab., 2015
Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding".
IEEE Trans. Multim., 2015
An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding.
IEEE Trans. Multim., 2015
IEEE Trans. Consumer Electron., 2015
A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency.
IEEE Trans. Circuits Syst. II Express Briefs, 2015
An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015
Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature.
Sensors, 2015
Sensors, 2015
Sensors, 2015
A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.
Sensors, 2015
Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.
J. Circuits Syst. Comput., 2015
IEICE Trans. Inf. Syst., 2015
The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2015
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015
Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints.
Sci. China Inf. Sci., 2015
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015
A flexible and energy-efficient reconfigurable architecture for symmetric cipher processing.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE International Conference on Consumer Electronics, 2015
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015
Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
REPROC: A Dynamically Reconfigurable Architecture for Symmetric Cryptography (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015
Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015
Acceleration of control flows on reconfigurable architecture with a composite method.
Proceedings of the 52nd Annual Design Automation Conference, 2015
Proceedings of the 52nd Annual Design Automation Conference, 2015
A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction.
Proceedings of the 52nd Annual Design Automation Conference, 2015
A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015
2014
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time.
IEEE Trans. Very Large Scale Integr. Syst., 2014
IEEE Trans. Very Large Scale Integr. Syst., 2014
IEEE Trans. Reliab., 2014
An uneven-dual-core processor based mobile platform for facilitating the collaboration among various embedded electronic devices.
IEEE Trans. Consumer Electron., 2014
A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps.
Sensors, 2014
Sci. China Inf. Sci., 2014
Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014
Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system.
Sci. China Inf. Sci., 2014
Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2014
Sci. China Inf. Sci., 2014
A fast and robust traffic sign recognition method using ring of RIBP histograms based feature.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014
Proceedings of the IEEE 57th International Midwest Symposium on Circuits and Systems, 2014
A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014
Hierarchical Pipeline Optimization of Coarse Grained Reconfigurable Processor for Multimedia Applications.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 22nd International Conference on Pattern Recognition, 2014
Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014
Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure.
Proceedings of the IEEE Frontiers in Education Conference, 2014
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014
Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
2013
IEEE Trans. Circuits Syst. II Express Briefs, 2013
A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration.
J. Syst. Archit., 2013
Int. J. Distributed Sens. Networks, 2013
Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
IEICE Trans. Inf. Syst., 2013
Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs.
IEICE Trans. Inf. Syst., 2013
The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013
Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip.
IEICE Trans. Inf. Syst., 2013
IEICE Electron. Express, 2013
An efficient VLSI architecture of speeded-up robust feature extraction for high resolution and high frame rate video.
Sci. China Inf. Sci., 2013
Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2013
Sci. China Inf. Sci., 2013
SPC: An Approach to Guarantee Performance in Cost Oriented Mapping Algorithm for NoC Architectures.
Proceedings of the IEEE Eighth International Conference on Networking, 2013
Battery-Aware MAC Analytical Modeling for Extending Lifetime of Low Duty-Cycled Wireless Sensor Network.
Proceedings of the IEEE Eighth International Conference on Networking, 2013
A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Affine transformations for communication and reconfiguration optimization of loops on CGRAs.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding.
Proceedings of the Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2013
Proceedings of the 50th Annual Design Automation Conference 2013, 2013
SURFEX: A 57fps 1080P resolution 220mW silicon implementation for simplified speeded-up robust feature with 65nm process.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013
An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013
2012
IEICE Trans. Inf. Syst., 2012
IEICE Trans. Electron., 2012
IEICE Trans. Inf. Syst., 2012
Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications.
IEICE Trans. Inf. Syst., 2012
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012
2011
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011
2010
IEEE Trans. Circuits Syst. II Express Briefs, 2010
IEICE Trans. Inf. Syst., 2010
IEICE Trans. Commun., 2010
Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System.
IEICE Trans. Inf. Syst., 2010
Proceedings of the Network and Parallel Computing, IFIP International Conference, 2010
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Parallel implementation of computing-intensive decoding algorithms of H.264 on reconfigurable SoC.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010
2009
Integr., 2009
IEICE Trans. Electron., 2009
Sci. China Ser. F Inf. Sci., 2009
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009
2007
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007
2006
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2006, 2006
2005
Proceedings of the IEEE 2005 Custom Integrated Circuits Conference, 2005
2004
2002
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems 2002, 2002