Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018

A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition.

[DOI]

Xiaowen Chen

Yuanwu Lei

Zhonghai Lu

Shuming Chen

IEEE Trans. Very Large Scale Integr. Syst., 2018

2017

Low Latency and Low Error Floating-Point Sine/Cosine Function Based TCORDIC Algorithm.

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2017

Platform-Adaptive High-Throughput Surveillance Video Condensation on Heterogeneous Processor Clusters.

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 2017

2016

Classification of Hyperspectral Remote Sensing Image Using Hierarchical Local-Receptive-Field-Based Extreme Learning Machine.

[DOI]

IEEE Geosci. Remote. Sens. Lett., 2016

An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning.

[DOI]

Neurocomputing, 2016

PR-ELM: Parallel regularized extreme learning machine based on cluster.

[DOI]

Neurocomputing, 2016

Multi-bit transient fault control for NoC links using 2D fault coding method.

[DOI]

Proceedings of the Tenth IEEE/ACM International Symposium on Networks-on-Chip, 2016

Single/Double Precision Floating-Point Division and Square Root Unit Based on SRT-8 Algorithm.

[DOI]

Proceedings of the Computer Engineering and Technology - 20th CCF Conference, 2016

2015

A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme.

[DOI]

IEICE Electron. Express, 2015

An efficient multi-standard QC-LDPC decoder based on the row-layered decoding algorithm.

[DOI]

IEICE Electron. Express, 2015

Accelerating Molecular Dynamics Simulations on Heterogeneous Architecture.

[DOI]

Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs.

[DOI]

Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

2014

FPGA Implementation of a Special-Purpose VLIW Structure for Double-Precision Elementary Function.

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2014

Transpose-free variable-size FFT accelerator based on-chip SRAM.

[DOI]

IEICE Electron. Express, 2014

CPU-GPU hybrid parallel strategy for cosmological simulations.

[DOI]

Concurr. Comput. Pract. Exp., 2014

2013

FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic.

[DOI]

J. Supercomput., 2013

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions.

[DOI]

ACM Trans. Archit. Code Optim., 2013

Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access.

[DOI]

IEICE Trans. Inf. Syst., 2013

2012

Design and Implementation of the Parameterized Multi-Standard High-Throughput Radix-4 Viterbi Decoder on FPGA.

[DOI]

IEICE Trans. Commun., 2012

2011

FPGA-Specific Custom VLIW Architecture for Arbitrary Precision Floating-Point Arithmetic.

[DOI]

Yuanwu Lei

Yong Dou

Jie Zhou

IEICE Trans. Inf. Syst., 2011

Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA.

[DOI]

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

VPFPAP: A Special-Purpose VLIW Processor for Variable-Precision Floating-Point Arithmetic.

[DOI]

Proceedings of the International Conference on Field Programmable Logic and Applications, 2011

FPGA Implementation of Variable-Precision Floating-Point Arithmetic.

[DOI]

Proceedings of the Advanced Parallel Processing Technologies - 9th International Symposium, 2011

2010

A Unified Co-Processor Architecture for Matrix Decomposition.

[DOI]

J. Comput. Sci. Technol., 2010

FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing.

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

2009

FPGA accelerating three QR decomposition algorithms in the unified pipelined framework.

[DOI]

Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs.

[DOI]

Proceedings of the FCCM 2009, 2009

A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA.

[DOI]

Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009

2008

Dynamic Configurable Floating-Point FFT Pipelines and Hybrid-Mode CORDIC on FPGA.

[DOI]

Proceedings of the International Conference on Embedded Software and Systems, 2008

Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor.

[DOI]

Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

Hybrid-Mode Floating-Point FPGA CORDIC Co-processor.

[DOI]

Proceedings of the Reconfigurable Computing: Architectures, 2008

Area and throughput trade-offs in design of arithmetic encoder for JPEG2000.

[DOI]

Baofeng Li

Yong Dou

Yuanwu Lei

Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

2007

FPGA SAR Processor with Window Memory Accesses.

[DOI]

Proceedings of the IEEE International Conference on Application-Specific Systems, 2007