2025
F<sup>3</sup>: An FPGA-Based Transformer Fine-Tuning Accelerator With Flexible Floating Point Format.
IEEE J. Emerg. Sel. Topics Circuits Syst., June, 2025
FANNS: An FPGA-Based Approximate Nearest-Neighbor Search Accelerator.
IEEE Trans. Very Large Scale Integr. Syst., April, 2025
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2025
2024
EN-TensorCore: Advancing TensorCores Performance through Encoder-Based Methodology.
CoRR, 2024
Artificial Neural Network based Model for Power GaN HEMTs down to 4.2K.
Proceedings of the IEEE International Conference on Integrated Circuits, 2024
EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology.
Proceedings of the 42nd IEEE International Conference on Computer Design, 2024
Task-Level Parallelism for the Multifrontal Method in Tightly Coupled CPU-FPGA Architectures.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2024
HBM-Based Hardware Accelerator for GNN Sampling and Aggregation.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2024
Tightly-Coupled FPGA Accelerator for Molecular Dynamics Simulation: Hardware-Software Co-Design and Fine-Grained Task Management.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2024
Efficient Message Passing Architecture for GCN Training on HBM-based FPGAs with Orthogonal Topology On-Chip Networks.
Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2024
RingTK: A Ring, Parallel and High Performance Top-K Sorter on FPGA.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024
A FPGA-HBM-Based Hardware Streaming Accelerator for GNN Sampling.
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024
2023
FTW-GAT: An FPGA-Based Accelerator for Graph Attention Networks With Ternary Weights.
IEEE Trans. Circuits Syst. II Express Briefs, November, 2023
Degree-Aware Graph Neural Network Quantization.
Entropy, November, 2023
MCANet: Multiscale Cross-Modality Attention Network for Multispectral Pedestrian Detection.
Proceedings of the MultiMedia Modeling - 29th International Conference, 2023
Scheduling Memory Access Optimization for HBM Based on CLOS.
Proceedings of the 25th International Conference on Advanced Communication Technology, 2023
Dynamic Neural Network Accelerator for Multispectral detection Based on FPGA.
Proceedings of the 25th International Conference on Advanced Communication Technology, 2023
2022
Exploration of Balanced Design in Resource-Constrained Edge Device for Efficient CNNs.
IEEE Trans. Circuits Syst. II Express Briefs, 2022
BaPipe: Balanced Pipeline Parallelism for DNN Training.
Parallel Process. Lett., 2022
QEGCN: An FPGA-based accelerator for quantized GCNs with edge-level parallelism.
J. Syst. Archit., 2022
G-NMP: Accelerating Graph Neural Networks with DIMM-based Near-Memory Processing.
J. Syst. Archit., 2022
FP-GNN: Adaptive FPGA accelerator for Graph Neural Networks.
Future Gener. Comput. Syst., 2022
Hardware Acceleration of Sampling Algorithms in Sample and Aggregate Graph Neural Networks.
CoRR, 2022
HuGraph: Acceleration of GCN Training on Heterogeneous FPGA Clusters with Quantization.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
An SSD-Based Accelerator for Singular Value Decomposition Recommendation Algorithm on Edge.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
2021
A hybrid precision low power computing-in-memory architecture for neural networks.
Microprocess. Microsystems, 2021
A Gather Accelerator for GNNs on FPGA Platform.
Proceedings of the 27th IEEE International Conference on Parallel and Distributed Systems, 2021
Software-Hardware Co-Optimization on Partial-Sum Problem for PIM-based Neural Network Accelerator.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
2020
FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters.
IEEE Trans. Computers, 2020
A Real-Time Learning-Based Super-Resolution System on FPGA.
Parallel Process. Lett., 2020
Enhancing energy efficiency of RISC-V processor-based embedded graphics systems through frame buffer compression.
Microprocess. Microsystems, 2020
FPGA Implementation of A∗ Algorithm for Real-Time Path Planning.
Int. J. Reconfigurable Comput., 2020
BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training.
CoRR, 2020
TB-DNN: A Thin Binarized Deep Neural Network with High Accuracy.
Proceedings of the 22nd International Conference on Advanced Communication Technology, 2020
Exploration of Memory Access Optimization for FPGA-based 3D CNN Accelerator.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020
RISC-V Graphics Rendering Instruction Set Extensions for Embedded AI Chips Implementation.
Proceedings of the BDET 2020: 2nd International Conference on Big Data Engineering and Technology, 2020
2019
A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing.
CoRR, 2019
CINT - An Energy-efficient Mixed-signal In-Memory CNN Accelerator Based on NOR Flash Memory.
Proceedings of the 17th Annual International Conference on Mobile Systems, 2019
FP-AMR: A Reconfigurable Fabric Framework for Adaptive Mesh Refinement Applications.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019
Accelerating AP3M-Based Computational Astrophysics Simulations with Reconfigurable Clusters.
Proceedings of the 30th IEEE International Conference on Application-specific Systems, 2019
2018
RP-Ring: A Heterogeneous Multi-FPGA Accelerator.
Int. J. Reconfigurable Comput., 2018
A pipelined division for fixed operation using user-defined floating point.
Proceedings of the 20th International Conference on Advanced Communication Technology, 2018
Accelerating a radio astronomy correlator on FPGA.
Proceedings of the 20th International Conference on Advanced Communication Technology, 2018
Soft-Core. Multiple-Lane, FPGA-based ADCs for a Liquid Helium Environment.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
An efficient resource-optimized learning prefetcher for solid state drives.
Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition, 2018
A Real-Time Learning-Based Super-Resolution System Using Direct Simple Functions.
Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018
2017
An Efficient Hardware Prefetcher Exploiting the Prefetch Potential of Long-Stride Access Pattern on Virtual Address.
Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), 2017
2016
A real-time global stereo-matching on FPGA.
Microprocess. Microsystems, 2016
An Accelerating Solution for N-Body MOND Simulation with FPGA-SoC.
Int. J. Reconfigurable Comput., 2016
Fixed-ratio DXT format Frame Buffer Compressor for mobile graphics systems.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016
FPGA acceleration of TreePM N-body simulations for Modified Newton Dynamics.
Proceedings of the 2016 International Conference on Field-Programmable Technology, 2016
An Improved Global Stereo-Matching on FPGA for Real-Time Applications (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
an Extensible Heterogeneous Multi-FPGA Framework for Accelerating N-body Simulation (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
An FPGA-SOC Based Accelerating Solution for N-body Simulations in MOND (Abstract Only).
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016
RP-Ring: A Heterogeneous Multi-FPGA Accelerating Solution for N-Body Simulations.
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016
2015
Design of a Distributed Compressor for Astronomy SSD.
Proceedings of the 23rd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2015
2014
A Multi-phase Clock Time-to-Digital Convertor Based on ISERDES Architecture.
Proceedings of the 22nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2014