Shaojun Wei

Orcid: 0000-0001-5117-7920

According to our database1, Shaojun Wei authored at least 363 papers between 2003 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow.
IEEE J. Solid State Circuits, October, 2024

CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering.
IEEE J. Solid State Circuits, October, 2024

A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm.
IEEE Trans. Parallel Distributed Syst., February, 2024

MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity.
IEEE J. Solid State Circuits, January, 2024

Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3 Advancing from the 4x Area Era to the 3x Area Era.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

A Low-Latency High-Order Arithmetic to Boolean Masking Conversion.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2024

Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture.
CoRR, 2024

SWG: an architecture for sparse weight gradient computation.
Sci. China Inf. Sci., 2024

CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency.
Sci. China Inf. Sci., 2024

A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm<sup>2</sup>/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction.
Proceedings of the IEEE Symposium on VLSI Technology and Circuits 2024, 2024

16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models.
Proceedings of the IEEE International Solid-State Circuits Conference, 2024

Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2024

CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Research on Performance Optimization of Encryption Algorithms for Network Security Framework.
Proceedings of the 2024 3rd International Conference on Cyber Security, 2024

Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism.
Proceedings of the 6th IEEE International Conference on AI Circuits and Systems, 2024

2023
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering.
IEEE Trans. Parallel Distributed Syst., December, 2023

M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., September, 2023

Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips.
IEEE Trans. Circuits Syst. I Regul. Pap., March, 2023

TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization.
IEEE J. Solid State Circuits, March, 2023

SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization.
IEEE Trans. Circuits Syst. I Regul. Pap., January, 2023

A Closer Look at the Chaotic Ring Oscillators based TRNG Design.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2023

STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition.
IEEE Trans. Circuits Syst. I Regul. Pap., 2023

SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2023

RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems.
IEEE J. Solid State Circuits, 2023

An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention.
IEEE J. Solid State Circuits, 2023

ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration.
IEEE J. Solid State Circuits, 2023

TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes.
IEEE J. Solid State Circuits, 2023

Wafer-scale Computing: Advancements, Challenges, and Future Perspectives.
CoRR, 2023

WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow.
CoRR, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
CoRR, 2023

A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing.
Proceedings of the 2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2023

CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers.
Proceedings of the IEEE International Solid- State Circuits Conference, 2023

Shogun: A Task Scheduling Framework for Graph Mining Accelerators.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

A Low-Randomness First-Order Masked Xoodyak.
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2023

TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions.
Proceedings of the 5th IEEE International Conference on Artificial Intelligence Circuits and Systems, 2023

2022
A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2022

BR-CIM: An Efficient Binary Representation Computation-In-Memory Design.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing.
IEEE Trans. Circuits Syst. I Regul. Pap., 2022

Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022

Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning.
IEEE J. Solid State Circuits, 2022

A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
IEEE J. Solid State Circuits, 2022

Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece.
IACR Cryptol. ePrint Arch., 2022

HQNAS: Auto CNN deployment framework for joint quantization and architecture search.
CoRR, 2022

FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences.
CoRR, 2022

An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance.
Sci. China Inf. Sci., 2022

A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration.
Proceedings of the IEEE International Solid-State Circuits Conference, 2022

CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

A SHA-512 Hardware Implementation Based on Block RAM Storage Structure.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit.
Proceedings of the 2022 IEEE International Conference on Integrated Circuits, 2022

Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Mixed-granularity parallel coarse-grained reconfigurable architecture.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Efficient access scheme for multi-bank based NTT architecture through conflict graph.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

Software Defined Chips - Volume I, 2
Springer, ISBN: 978-981-19-6993-5, 2022

2021
An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2021

A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment.
IEEE Trans. Multim., 2021

LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR.
IEEE Trans. Circuits Syst. I Regul. Pap., 2021

Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2021

Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters.
IEEE J. Solid State Circuits, 2021

TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
IEEE J. Solid State Circuits, 2021

Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning".
IEEE J. Solid State Circuits, 2021

Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning.
IEEE J. Solid State Circuits, 2021

Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers.
Sci. China Inf. Sci., 2021

A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation.
Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization.
Proceedings of the IEEE International Solid-State Circuits Conference, 2021

ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning.
Proceedings of the 27th IEEE International Symposium on Asynchronous Circuits and Systems, 2021

A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs.
Proceedings of the ASPDAC '21: 26th Asia and South Pacific Design Automation Conference, 2021

HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training.
Proceedings of the 3rd IEEE International Conference on Artificial Intelligence Circuits and Systems, 2021

2020
Near-Optimal MIMO-SCMA Uplink Detection With Low-Complexity Expectation Propagation.
IEEE Trans. Wirel. Commun., 2020

Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems.
IEEE Trans. Signal Process., 2020

Achieving Flexible Global Reconfiguration in NoCs Using Reconfigurable Rings.
IEEE Trans. Parallel Distributed Syst., 2020

Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation.
IEEE Trans. Parallel Distributed Syst., 2020

A Multi-Task Hardwired Accelerator for Face Detection and Alignment.
IEEE Trans. Circuits Syst. Video Technol., 2020

Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT.
IACR Trans. Cryptogr. Hardw. Embed. Syst., 2020

A 4K × 2K@60fps Multifunctional Video Display Processor for High Perceptual Image Quality.
IEEE Trans. Circuits Syst. I Regul. Pap., 2020

A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power.
IEEE Trans. Circuits Syst. II Express Briefs, 2020

Efficient Scheduling of Irregular Network Structures on CNN Accelerators.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Aggressive Fine-Grained Power Gating of NoC Buffers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication.
IEEE Trans. Computers, 2020

A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection.
IEEE J. Solid State Circuits, 2020

A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm.
IACR Cryptol. ePrint Arch., 2020

A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications.
ACM Comput. Surv., 2020

TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

A Reconfigurable Branch Predictor for Spatial Computing Architectures.
Proceedings of the ICDSP 2020: 4th International Conference on Digital Signal Processing, 2020

PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs.
Proceedings of the 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2020

STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

CDRing: Reconfigurable Ring Architecture by Exploiting Cycle Decomposition of Torus Topology.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

TAEM: Fast Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2020

2019
Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory.
IEEE Trans. Parallel Distributed Syst., 2019

Face Alignment With Expression- and Pose-Based Adaptive Initialization.
IEEE Trans. Multim., 2019

Reconfigurable Architecture for Neural Approximation in Multimedia Computing.
IEEE Trans. Circuits Syst. Video Technol., 2019

A Face Alignment Accelerator Based on Optimized Coarse-to-Fine Shape Searching.
IEEE Trans. Circuits Syst. Video Technol., 2019

An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning.
IEEE Trans. Circuits Syst. I Regul. Pap., 2019

A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression.
IEEE Trans. Circuits Syst. II Express Briefs, 2019

A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2019

An STT-MRAM Based in Memory Architecture for Low Power Integral Computing.
IEEE Trans. Computers, 2019

An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width.
IEEE J. Solid State Circuits, 2019

A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS.
Proceedings of the 2019 Symposium on VLSI Circuits, Kyoto, Japan, June 9-14, 2019, 2019

MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions.
Proceedings of the 17th IEEE International New Circuits and Systems Conference, 2019

An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2019

FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Towards Efficient Compact Network Training on Edge-Devices.
Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI, 2019

A Reliable Physical Unclonable Function Based on Differential Charging Capacitors.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms.
Proceedings of the International Conference on Computer-Aided Design, 2019

Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor.
Proceedings of the 2019 IEEE Hot Chips 31 Symposium (HCS), 2019

A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural Convolutional Network.
Proceedings of the 2019 on Great Lakes Symposium on VLSI, 2019

Constructing Concurrent Data Structures on FPGA with Channels.
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable Architectures.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

L-MPC: A LUT based Multi-Level Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2019

Small-Footprint Keyword Spotting with Graph Convolutional Network.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Massive MIMO Detection Algorithm and VLSI Architecture
Springer, ISBN: 978-981-13-6361-0, 2019

2018
Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems.
IEEE Trans. Signal Process., 2018

Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2018

Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration.
IEEE Trans. Parallel Distributed Syst., 2018

A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018

A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT.
IEEE Trans. Circuits Syst. I Regul. Pap., 2018

HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing.
IEEE Trans. Circuits Syst. II Express Briefs, 2018

Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018

A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications.
IEEE J. Solid State Circuits, 2018

Optimization of Softmax Layer in Deep Neural Network Using Integral Stochastic Computation.
J. Low Power Electron., 2018

FP-BNN: Binarized neural network on FPGA.
Neurocomputing, 2018

Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution.
IEEE Comput. Archit. Lett., 2018

Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis.
IEEE Access, 2018

A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS.
Proceedings of the 2018 IEEE Symposium on VLSI Circuits, 2018

An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

Bit-width Adaptive Accelerator Design for Convolution Neural Network.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2018

RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Efficient Hardware Architecture of Softmax Layer in Deep Neural Network.
Proceedings of the 23rd IEEE International Conference on Digital Signal Processing, 2018

An efficient kernel transformation architecture for binary- and ternary-weight neural network inference.
Proceedings of the 55th Annual Design Automation Conference, 2018

LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA.
Proceedings of the 55th Annual Design Automation Conference, 2018

A Full Multicast Reconfigurable Non-blocking Permutation Network.
Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, 2018

A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture.
Proceedings of the IEEE Asian Solid-State Circuits Conference, 2018

A 4K×2K@60fps Multi-format Multi-function Display Processor for High Perceptual Quality.
Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems, 2018

2017
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns.
IEEE Trans. Very Large Scale Integr. Syst., 2017

Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture.
IEEE Trans. Signal Process., 2017

Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory.
IEEE Trans. Parallel Distributed Syst., 2017

CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays.
IEEE Trans. Parallel Distributed Syst., 2017

A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks.
IEEE Trans. Inf. Forensics Secur., 2017

PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC.
IEEE Trans. Circuits Syst. II Express Briefs, 2017

An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation.
IEEE Syst. J., 2017

Implementation of in-loop filter for HEVC decoder on reconfigurable processor.
IET Image Process., 2017

Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion.
IEEE Access, 2017

Multi-CNN and decision tree based driving behavior evaluation.
Proceedings of the Symposium on Applied Computing, 2017

AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs.
Proceedings of the IEEE 6th Non-Volatile Memory Systems and Applications Symposium, 2017

DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Memory fartitioning-based modulo scheduling for high-level synthesis.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation.
Proceedings of the on Great Lakes Symposium on VLSI 2017, 2017

Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only).
Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA.
Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

Disturbance Aware Memory Partitioning for Parallel Data Access in STT-RAM.
Proceedings of the 54th Annual Design Automation Conference, 2017

A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment.
Proceedings of the 54th Annual Design Automation Conference, 2017

A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications.
Proceedings of the 54th Annual Design Automation Conference, 2017

Minimizing Pipeline Stalls in Distributed-Controlled Coarse-Grained Reconfigurable Arrays with Triggered Instruction Issue and Execution.
Proceedings of the 54th Annual Design Automation Conference, 2017

Stress-Aware Loops Mapping on CGRAs with Considering NBTI Aging Effect.
Proceedings of the 54th Annual Design Automation Conference, 2017

Energy-aware loops mapping on multi-vdd CGRAs without performance degradation.
Proceedings of the 22nd Asia and South Pacific Design Automation Conference, 2017

2016
Trigger-Centric Loop Mapping on CGRAs.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2016

CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache.
IEEE Trans. Very Large Scale Integr. Syst., 2016

A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Parallel Distributed Syst., 2016

TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions.
IEEE Trans. Parallel Distributed Syst., 2016

Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture.
IEEE Trans. Inf. Forensics Secur., 2016

A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator.
IEEE Trans. Circuits Syst. Video Technol., 2016

A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation.
IEEE Trans. Circuits Syst. II Express Briefs, 2016

Joint Modulo Scheduling and V<sub>dd</sub> Assignment for Loop Mapping on Dual- V<sub>dd</sub> CGRAs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2016

A pipelined area-efficient and high-speed reconfigurable processor for floating-point FFT/IFFT and DCT/IDCT computations.
Microelectron. J., 2016

Temperature-aware multi-application mapping on network-on-chip based many-core systems.
Microprocess. Microsystems, 2016

An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform.
IEICE Trans. Inf. Syst., 2016

A fast face detection architecture for auto-focus in smart-phones and digital cameras.
Sci. China Inf. Sci., 2016

A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.
IEEE Comput. Archit. Lett., 2016

Energy management on DVS based coarse-grained reconfigurable platform.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2016

Temperature-aware task scheduling heuristics on Network-on-Chips.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Multibank memory optimization for parallel data access in multiple data arrays.
Proceedings of the 35th International Conference on Computer-Aided Design, 2016

Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays.
Proceedings of the 53rd Annual Design Automation Conference, 2016

Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures.
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016

2015
A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Energy Management on Battery-Powered Coarse-Grained Reconfigurable Platforms.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures.
IEEE Trans. Very Large Scale Integr. Syst., 2015

A Low-Latency and Low-Power Hybrid Scheme for On-Chip Networks.
IEEE Trans. Very Large Scale Integr. Syst., 2015

Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm.
ACM Trans. Reconfigurable Technol. Syst., 2015

Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding".
IEEE Trans. Multim., 2015

An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding.
IEEE Trans. Multim., 2015

A real-time time-consistent 2D-to-3D video conversion system using color histogram.
IEEE Trans. Consumer Electron., 2015

A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency.
IEEE Trans. Circuits Syst. II Express Briefs, 2015

An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature.
Sensors, 2015

A Novel 2D-to-3D Video Conversion Method Using Time-Coherent Depth Maps.
Sensors, 2015

High-Performance Motion Estimation for Image Sensors with Video Compression.
Sensors, 2015

A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction.
Sensors, 2015

Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array.
J. Circuits Syst. Comput., 2015

Low-Power Loop Parallelization onto CGRA Utilizing Variable Dual V<sub>DD</sub>.
IEICE Trans. Inf. Syst., 2015

The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2015

Battery-Aware Loop Nests Mapping for CGRAs.
IEICE Trans. Inf. Syst., 2015

Mapping Multi-Level Loop Nests onto CGRAs Using Polyhedral Optimizations.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2015

Exploring partitioning methods for multicast in 3D bufferless Network on Chip.
IEICE Electron. Express, 2015

Mapping of Embedded Applications on Hybrid Networks-on-Chip with Multiple Switching Mechanisms.
IEEE Embed. Syst. Lett., 2015

Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints.
Sci. China Inf. Sci., 2015

A Multi-modal 2D + 3D Face Recognition Method with a Novel Local Feature Descriptor.
Proceedings of the 2015 IEEE Winter Conference on Applications of Computer Vision, 2015

Partitioning Methods for Multicast in Bufferless 3D Network on Chip.
Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

Neural approximating architecture targeting multiple application domains.
Proceedings of the 2015 IEEE International Symposium on Circuits and Systems, 2015

Real-time time-consistent 2D-to-3D video conversion based on color histogram.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Efficient lane detection system based on monocular camera.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

An automatic depth map generation method by image classification.
Proceedings of the IEEE International Conference on Consumer Electronics, 2015

Acceleration of Nested Conditionals on CGRAs via Trigger Scheme.
Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2015

Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only).
Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015

Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Joint affine transformation and loop pipelining for mapping nested loop on CGRAs.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

RNA: a reconfigurable architecture for hardware neural acceleration.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Acceleration of control flows on reconfigurable architecture with a composite method.
Proceedings of the 52nd Annual Design Automation Conference, 2015

Efficient memory partitioning for parallel data access in multidimensional arrays.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction.
Proceedings of the 52nd Annual Design Automation Conference, 2015

A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space.
Proceedings of the 2015 IEEE Custom Integrated Circuits Conference, 2015

Scheduling stream programs with improving arithmetic unit usage on NoC-based VLIW multi-core architectures.
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Battery-aware mapping optimization of loop nests for CGRAs.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures.
Proceedings of the 20th Asia and South Pacific Design Automation Conference, 2015

2014
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time.
IEEE Trans. Very Large Scale Integr. Syst., 2014

SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Software/Hardware Parallel Long-Period Random Number Generation Framework Based on the WELL Method.
IEEE Trans. Very Large Scale Integr. Syst., 2014

Compiler-Assisted Leakage- and Temperature- Aware Instruction-Level VLIW Scheduling.
IEEE Trans. Very Large Scale Integr. Syst., 2014

A High-Utilization Scheduling Schemeof Stream Programs on ClusteredVLIW Stream Architectures.
IEEE Trans. Parallel Distributed Syst., 2014

A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps.
Sensors, 2014

A 1/2.5 inch VGA 400 fps CMOS Image Sensor With High Sensitivity for Machine Vision.
IEEE J. Solid State Circuits, 2014

Hybrid circuit-switched network for on-chip communication in large-scale chip-multiprocessors.
J. Parallel Distributed Comput., 2014

MapReduce inspired loop mapping for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014

Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2014

Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system.
Sci. China Inf. Sci., 2014

Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2014

Optimization of speeded-up robust feature algorithm for hardware implementation.
Sci. China Inf. Sci., 2014

A fast and robust traffic sign recognition method using ring of RIBP histograms based feature.
Proceedings of the 2014 IEEE International Conference on Robotics and Biomimetics, 2014

Low-power loop pipelining mapping onto CGRA utilizing variable dual VDD.
Proceedings of the IEEE 57th International Midwest Symposium on Circuits and Systems, 2014

A 65 nm uneven-dual-core SoC based platform for multi-device collaborative computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

A parallel hardware architecture for fast integral image computing.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

Map-reduce inspired loop parallelization on CGRA.
Proceedings of the IEEE International Symposium on Circuits and Systemss, 2014

A FAST Extreme Illumination Robust Feature in Affine Space.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure.
Proceedings of the IEEE Frontiers in Education Conference, 2014

Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014

Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013
Low-Power Reconfigurable Processor Utilizing Variable Dual V<sub>DD</sub>.
IEEE Trans. Circuits Syst. II Express Briefs, 2013

A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration.
J. Syst. Archit., 2013

Energy-efficient stream task scheduling scheme for embedded multimedia applications on multi-issued stream architectures.
J. Syst. Archit., 2013

Calibration Techniques for Low-Power Wireless Multiband Transceiver.
Int. J. Distributed Sens. Networks, 2013

Concurrent Detection and Recognition of Individual Object Based on Colour and p-SIFT Features.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

An Inductive-Coupling Interconnected Application-Specific 3D NoC Design.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Battery-Aware Task Mapping for Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2013

Affine Transformations for Communication and Reconfiguration Optimization of Mapping Loop Nests on CGRAs.
IEICE Trans. Inf. Syst., 2013

The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Parallelization of Computing-Intensive Tasks of SIFT Algorithm on a Reconfigurable Architecture System.
IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2013

Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip.
IEICE Trans. Inf. Syst., 2013

An efficient VLSI architecture of speeded-up robust feature extraction for high resolution and high frame rate video.
Sci. China Inf. Sci., 2013

Hierarchical representation of on-chip context to reduce reconfiguration time and implementation area for coarse-grained reconfigurable architecture.
Sci. China Inf. Sci., 2013

ReSSIM: a mixed-level simulator for dynamic coarse-grained reconfigurable processor.
Sci. China Inf. Sci., 2013

SPC: An Approach to Guarantee Performance in Cost Oriented Mapping Algorithm for NoC Architectures.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

Battery-Aware MAC Analytical Modeling for Extending Lifetime of Low Duty-Cycled Wireless Sensor Network.
Proceedings of the IEEE Eighth International Conference on Networking, 2013

Compiler-assisted leakage energy optimization of media applications on stream architectures.
Proceedings of the International Symposium on Quality Electronic Design, 2013

A VLSI architecture for enhancing the fault tolerance of NoC using quad-spare mesh topology and dynamic reconfiguration.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Affine transformations for communication and reconfiguration optimization of loops on CGRAs.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Implementation of multi-standard video decoding algorithms on a coarse-grained reconfigurable multimedia processor.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013

Mapping IDCT of MPEG2 on Coarse-Grained Reconfigurable Array for Matching 1080p Video Decoding.
Proceedings of the Advanced Technologies, Embedded and Multimedia for Human-centric Computing, 2013

Polyhedral model based mapping optimization of loop nests for CGRAs.
Proceedings of the 50th Annual Design Automation Conference 2013, 2013

SURFEX: A 57fps 1080P resolution 220mW silicon implementation for simplified speeded-up robust feature with 65nm process.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

An energy-efficient coarse-grained dynamically reconfigurable fabric for multiple-standard video decoding applications.
Proceedings of the IEEE 2013 Custom Integrated Circuits Conference, 2013

A power-efficient network-on-chip for multi-core stream processors.
Proceedings of the IEEE 10th International Conference on ASIC, 2013

2012
Configuration Context Reduction for Coarse-Grained Reconfigurable Architecture.
IEICE Trans. Inf. Syst., 2012

Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC.
IEICE Trans. Electron., 2012

Multi-Battery Scheduling for Battery-Powered DVS Systems.
IEICE Trans. Commun., 2012

Mapping Optimization of Affine Loop Nests for Reconfigurable Computing Architecture.
IEICE Trans. Inf. Syst., 2012

Reconfiguration Process Optimization of Dynamically Coarse Grain Reconfigurable Architecture for Multimedia Applications.
IEICE Trans. Inf. Syst., 2012

Reducing configuration contexts for coarse-grained reconfigurable architecture.
Proceedings of the 2012 IEEE International Symposium on Circuits and Systems, 2012

Low Power Schedule Algorithm for Embedded Multimedia Applications Basing on Imagine-L Processor.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Software/hardware framework for generating parallel Gaussian random numbers based on the Monty Python method.
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012

An Efficient Hardware Random Number Generator Based on the MT Method.
Proceedings of the 12th IEEE International Conference on Computer and Information Technology, 2012

2011
A high efficient baseband transceiver for IEEE 802.15.4 LR-WPAN systems.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

Performance evaluation modeling for reconfigurable processor.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

An energy efficiency task scheduling algorithm for streaming applications on multiprocessor SoC.
Proceedings of the 2011 IEEE 9th International Conference on ASIC, 2011

2010
A Cycle-Accurate Simulator for a Reconfigurable Multi-Media System.
IEICE Trans. Inf. Syst., 2010

CropNET: A Wireless Multimedia Sensor Network for Agricultural Monitoring.
IEICE Trans. Commun., 2010

Parallelization of Computing-Intensive Tasks of the H.264 High Profile Decoding Algorithm on a Reconfigurable Multimedia System.
IEICE Trans. Inf. Syst., 2010

A reconfigurable multi-processor SoC for media applications.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

A VLSI design of sensor node for wireless image sensor network.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Parallel implementation of computing-intensive decoding algorithms of H.264 on reconfigurable SoC.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Reconfigurable computing - evolution of Von Neumann architecture.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Battery aware tasks allocating algorithm for multi-battery operated system.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

Mixed-level modeling for network on chip infrastructure in SoC design.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2010

2009
Compiler Framework for Reconfigurable Computing Architecture.
IEICE Trans. Electron., 2009

Buffer planning for application-specific networks-on-chip design.
Sci. China Ser. F Inf. Sci., 2009

2008
Key technologies of system on chip design.
Sci. China Ser. F Inf. Sci., 2008

2007
Battery-Aware Variable Voltage Scheduling on Real-Time Multiprocessor Platforms.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

2006
On handling the fixed-outline constraints of floorplanning using less flexibility first principles.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

2003
Emerging markets: design goes global.
Proceedings of the 40th Design Automation Conference, 2003


  Loading...