Yuanwu Lei

Orcid: 0000-0003-3587-1618

According to our database1, Yuanwu Lei authored at least 39 papers between 2007 and 2022.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2022
MT-3000: a heterogeneous multi-zone processor for HPC.
CCF Trans. High Perform. Comput., 2022

2021
Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directions.
CCF Trans. High Perform. Comput., 2021

2019
Pair-HMM accelerator based on non-cooperative structure.
IEICE Electron. Express, 2019

MT-DMA: A DMA Controller Supporting Efficient Matrix Transposition for Digital Signal Processing.
IEEE Access, 2019

An Efficient Direct Memory Access (DMA) Controller for Scientific Computing Accelerators.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2019

Efficient Large-Scale 1D FFT Vectorization on Multi-Core Vector Accelerator.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018
A Variable-Size FFT Hardware Accelerator Based on Matrix Transposition.
IEEE Trans. Very Large Scale Integr. Syst., 2018

2017
Low Latency and Low Error Floating-Point Sine/Cosine Function Based TCORDIC Algorithm.
IEEE Trans. Circuits Syst. I Regul. Pap., 2017

Platform-Adaptive High-Throughput Surveillance Video Condensation on Heterogeneous Processor Clusters.
Proceedings of the Advanced Parallel Processing Technologies, 2017

2016
Classification of Hyperspectral Remote Sensing Image Using Hierarchical Local-Receptive-Field-Based Extreme Learning Machine.
IEEE Geosci. Remote. Sens. Lett., 2016

An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning.
Neurocomputing, 2016

PR-ELM: Parallel regularized extreme learning machine based on cluster.
Neurocomputing, 2016

Multi-bit transient fault control for NoC links using 2D fault coding method.
Proceedings of the Tenth IEEE/ACM International Symposium on Networks-on-Chip, 2016

Single/Double Precision Floating-Point Division and Square Root Unit Based on SRT-8 Algorithm.
Proceedings of the Computer Engineering and Technology - 20th CCF Conference, 2016

2015
A deeply-pipelined FPGA-based SpMV accelerator with a hardware-friendly storage scheme.
IEICE Electron. Express, 2015

An efficient multi-standard QC-LDPC decoder based on the row-layered decoding algorithm.
IEICE Electron. Express, 2015

Accelerating Molecular Dynamics Simulations on Heterogeneous Architecture.
Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

Designing Parallel Sparse Matrix Transposition Algorithm Using ELLPACK-R for GPUs.
Proceedings of the Computer Engineering and Technology - 19th CCF Conference, 2015

2014
FPGA Implementation of a Special-Purpose VLIW Structure for Double-Precision Elementary Function.
ACM Trans. Reconfigurable Technol. Syst., 2014

Transpose-free variable-size FFT accelerator based on-chip SRAM.
IEICE Electron. Express, 2014

CPU-GPU hybrid parallel strategy for cosmological simulations.
Concurr. Comput. Pract. Exp., 2014

2013
FPGA implementation of an exact dot product and its application in variable-precision floating-point arithmetic.
J. Supercomput., 2013

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions.
ACM Trans. Archit. Code Optim., 2013

Window Memory Layout Scheme for Alternate Row-Wise/Column-Wise Matrix Access.
IEICE Trans. Inf. Syst., 2013

2012
Design and Implementation of the Parameterized Multi-Standard High-Throughput Radix-4 Viterbi Decoder on FPGA.
IEICE Trans. Commun., 2012

2011
FPGA-Specific Custom VLIW Architecture for Arbitrary Precision Floating-Point Arithmetic.
IEICE Trans. Inf. Syst., 2011

Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA.
Proceedings of the IEEE 29th International Conference on Computer Design, 2011

VPFPAP: A Special-Purpose VLIW Processor for Variable-Precision Floating-Point Arithmetic.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2011

FPGA Implementation of Variable-Precision Floating-Point Arithmetic.
Proceedings of the Advanced Parallel Processing Technologies - 9th International Symposium, 2011

2010
A Unified Co-Processor Architecture for Matrix Decomposition.
J. Comput. Sci. Technol., 2010

FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing.
Proceedings of the 24th International Conference on Supercomputing, 2010

2009
FPGA accelerating three QR decomposition algorithms in the unified pipelined framework.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

A Fine-grained Pipelined Implementation of the LINPACK Benchmark on FPGAs.
Proceedings of the FCCM 2009, 2009

A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA.
Proceedings of the Advanced Parallel Processing Technologies, 8th International Symposium, 2009

2008
Dynamic Configurable Floating-Point FFT Pipelines and Hybrid-Mode CORDIC on FPGA.
Proceedings of the International Conference on Embedded Software and Systems, 2008

Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications, 2008

Hybrid-Mode Floating-Point FPGA CORDIC Co-processor.
Proceedings of the Reconfigurable Computing: Architectures, 2008

Area and throughput trade-offs in design of arithmetic encoder for JPEG2000.
Proceedings of the IEEE Asia Pacific Conference on Circuits and Systems, 2008

2007
FPGA SAR Processor with Window Memory Accesses.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007


  Loading...