Weifeng Liu

Orcid: 0000-0002-2150-5759

Affiliations:
  • China University of Petroleum, Beijing, China
  • Norwegian University of Science and Technology (former)
  • University of Copenhagen, Niels Bohr Institute (NBI) (former)
  • STFC Rutherford Appleton Laboratory, Didcot, UK (former)


According to our database1, Weifeng Liu authored at least 50 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
thSORT: an efficient parallel sorting algorithm on multi-core DSPs.
CCF Trans. High Perform. Comput., October, 2024

Cuper: Customized Dataflow and Perceptual Decoding for Sparse Matrix-Vector Multiplication on HBM-Equipped FPGAs.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Efficient Spectral-Aware Power Supply Noise Analysis for Low-Power Design Verification.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

MASC: A Memory-Efficient Adjoint Sensitivity Analysis through Compression Using Novel Spatiotemporal Prediction.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

ReCG: ReRAM-Accelerated Sparse Conjugate Gradient.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Machine Learning and GPU Accelerated Sparse Linear Solvers for Transistor-Level Circuit Simulation: A Perspective Survey (Invited Paper).
Proceedings of the 29th Asia and South Pacific Design Automation Conference, 2024

2023
TileSpTRSV: a tiled algorithm for parallel sparse triangular solve on GPUs.
CCF Trans. High Perform. Comput., June, 2023

Editorial for the special issue on architecture, algorithms and applications of high performance sparse matrix computations.
CCF Trans. High Perform. Comput., June, 2023

DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication.
Proceedings of the International Conference for High Performance Computing, 2023

PanguLU: A Scalable Regular Two-Dimensional Block-Cyclic Sparse Direct Solver on Distributed Heterogeneous Systems.
Proceedings of the International Conference for High Performance Computing, 2023

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

Accelerating Sparse LU Factorization with Density-Aware Adaptive Matrix Multiplication for Circuit Simulation.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

AmgR: Algebraic Multigrid Accelerated on ReRAM.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
A Pattern-Based SpGEMM Library for Multi-Core and Many-Core Architectures.
IEEE Trans. Parallel Distributed Syst., 2022

TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

TileSpMSpV: A Tiled Algorithm for Sparse Matrix-Sparse Vector Multiplication on GPUs.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
YuenyeungSpTRSV: A Thread-Level and Warp-Level Fusion Synchronization-Free Sparse Triangular Solve.
IEEE Trans. Parallel Distributed Syst., 2021

BALS: Blocked Alternating Least Squares for Parallel Sparse Matrix Factorization on GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.
Int. J. Parallel Program., 2021

Implementing LU and Cholesky factorizations on artificial intelligence accelerators.
CCF Trans. High Perform. Comput., 2021

TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

PALBBD: A Parallel ArcLength Method Using Bordered Block Diagonal Form for DC Analysis.
Proceedings of the GLSVLSI '21: Great Lakes Symposium on VLSI 2021, 2021

SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs.
Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

2020
clMF: A fine-grained and portable alternating least squares algorithm for parallel matrix factorization.
Future Gener. Comput. Syst., 2020

NUMA-Aware Optimization of Sparse Matrix-Vector Multiplication on ARMv8-Based Many-Core Architectures.
Proceedings of the Network and Parallel Computing, 2020

Segmented Merge: A New Primitive for Parallel Sparse Matrix Computations.
Proceedings of the Network and Parallel Computing, 2020

CapelliniSpTRSV: A Thread-Level Synchronization-Free Sparse Triangular Solve on GPUs.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Efficient Block Algorithms for Parallel Sparse Triangular Solve.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019
Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication.
Int. J. Parallel Program., 2019

Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors.
CCF Trans. High Perform. Comput., 2019

IA-SpGEMM: an input-aware auto-tuning framework for parallel sparse matrix-matrix multiplication.
Proceedings of the ACM International Conference on Supercomputing, 2019

2018
Back-dropout transfer learning for action recognition.
IET Comput. Vis., 2018

swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architectures.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Register-based implementation of the sparse general matrix-matrix multiplication on GPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Warp-Consolidation: A Novel Execution Model for GPUs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

2017
Fast synchronization-free algorithms for parallel sparse triangular solves with multiple right-hand sides.
Concurr. Comput. Pract. Exp., 2017

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels.
Proceedings of the International Conference for High Performance Computing, 2017

Efficient and Portable ALS Matrix Factorization for Recommender Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Fast segmented sort on GPUs.
Proceedings of the International Conference on Supercomputing, 2017

Locality-Aware CTA Clustering for Modern GPUs.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Parallel Transposition of Sparse Data Structures.
Proceedings of the 2016 International Conference on Supercomputing, 2016

A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

2015
Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors.
Parallel Comput., 2015

A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors.
J. Parallel Distributed Comput., 2015

CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Unsupervised Behavior-Specific Dictionary Learning for Abnormal Event Detection.
Proceedings of the British Machine Vision Conference 2015, 2015

2014
An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

ad-heap: an Efficient Heap Data Structure for Asymmetric Multicore Processors.
Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014


  Loading...