Panruo Wu

Orcid: 0000-0003-1859-3580

According to our database1, Panruo Wu authored at least 36 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition.
CoRR, 2024

Pipelet: Practical Streamlined Blockchain Protocol.
CoRR, 2024

2023
Dynamic Mode Decomposition for Large-Scale Coherent Structure Extraction in Shear Flows.
IEEE Trans. Vis. Comput. Graph., 2023

Fast Symmetric Eigenvalue Decomposition via WY Representation on Tensor Core.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

2021
Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

2020
Basic Linear Algebra Operations on TensorCore GPU.
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020

TensorSVM: accelerating kernel machines with tensor engine.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

High Accuracy Matrix Computations on Neural Engines: A Study of QR Factorization and its Applications.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

Wukong: a scalable and locality-enhanced framework for serverless parallel computing.
Proceedings of the SoCC '20: ACM Symposium on Cloud Computing, 2020

2019
PLASMA: Parallel Linear Algebra Software for Multicore Using OpenMP.
ACM Trans. Math. Softw., 2019

High Accuracy Low Precision QR Factorization and Least Square Solver on GPU with TensorCore.
CoRR, 2019

xSVM: Scalable Distributed Kernel Support Vector Machine Training.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018
Symmetric Indefinite Linear Solver Using OpenMP Task on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2018

Fault tolerant one-sided matrix decompositions on heterogeneous systems with GPUs.
Proceedings of the International Conference for High Performance Computing, 2018

Work-in-Progress: Incorporating Deadline-Based Scheduling in Tasking Programming Model for Extreme-Scale Parallel Computing.
Proceedings of the 2018 IEEE Real-Time Systems Symposium, 2018

The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018

2017
Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support.
IEEE Trans. Signal Process., 2017

Correcting soft errors online in fast fourier transform.
Proceedings of the International Conference for High Performance Computing, 2017

Investigating half precision arithmetic to accelerate dense linear system solvers.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

Silent Data Corruption Resilient Two-sided Matrix Factorizations.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

2016
Silent Data Corruption Resilient Matrix Factorizations on Distributed Memory System.
PhD thesis, 2016

Design, Use and Evaluation of P-FSEFI: A Parallel Soft Error Fault Injection Framework for Emulating Soft Errors in Parallel Applications.
Proceedings of the 9th EAI International Conference on Simulation Tools and Techniques, 2016

GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing.
Proceedings of the International Conference for High Performance Computing, 2016

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Towards Practical Algorithm Based Fault Tolerance in Dense Linear Algebra.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

New-Sum: A Novel Online ABFT Scheme For General Iterative Methods.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

SDC is in the Eye of the Beholder: A Survey and Preliminary Study.
Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2016

2015
Fail-Stop Failure Algorithm-Based Fault Tolerance for Cholesky Decomposition.
IEEE Trans. Parallel Distributed Syst., 2015

Accelerated Discrete Distribution Clustering under Wasserstein Distance.
CoRR, 2015

Investigating the Interplay between Energy Efficiency and Resilience in High Performance Computing.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Extending checksum-based ABFT to tolerate soft errors online in iterative methods.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

2013
On-line soft error correction in matrix-matrix multiplication.
J. Comput. Sci., 2013

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach.
Proceedings of the International Conference for High Performance Computing, 2013

2012
Energy Efficient Parallel Matrix-Matrix Multiplication for DVFS-enabled Clusters.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

2011
Fault tolerant matrix-matrix multiplication: correcting soft errors on-line.
Proceedings of the second workshop on Scalable algorithms for large-scale systems, 2011


  Loading...