Ichitaro Yamazaki

David J. Keffer

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Bringing High Performance Computing to Big Data Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Big Data Technologies, 2017

2016

Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2016

Linear algebra software for large-scale accelerated multicore computing.

[BibT_eX]

[DOI]

Acta Numer., 2016

Heterogeneous Streaming.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2015

Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations.

[BibT_eX]

[DOI]

Sci. Program., 2015

Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2015

A survey of recent developments in parallel implementations of Gaussian elimination.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

Mixed-precision block gram Schmidt orthogonalization.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015

Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

2014

Communication-Avoiding Symmetric-Indefinite Factorization.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 2014

Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2014

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

Deflation strategies to improve the convergence of communication-avoiding GMRES.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster.

[BibT_eX]

[DOI]

Sivasankaran Rajamanickam

Proceedings of the International Conference for High Performance Computing, 2014

Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014

Improving the Performance of CA-GMRES on Multicores with Multiple GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Krylov Subspace Solvers on Graphics Processing Units.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Access-averse framework for computing low-rank matrix approximations.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.

[BibT_eX]

[DOI]

Proceedings of the Numerical Computations with GPUs, 2014

2013

Performance comparison of parallel eigensolvers based on a contour integral method and a Lanczos method.

[BibT_eX]

[DOI]

Parallel Comput., 2013

On Partitioning and Reordering Problems in a Hierarchically Parallel Hybrid Linear Solver.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Virtual Systolic Array for QR Decomposition.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

dqds with Aggressive Early Deflation.

[BibT_eX]

[DOI]

Yuji Nakatsukasa

Kensuke Aishima

SIAM J. Matrix Anal. Appl., 2012

One-sided Dense Matrix Factorizations on a Multicore with Multiple GPU Accelerators.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2012

A hybrid Hermitian general eigenvalue solver

[BibT_eX]

[DOI]

CoRR, 2012

Poster: Matrices over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Matrices Over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster Systems.

[BibT_eX]

[DOI]

Xiaoye S. Li

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

A Communication-Avoiding Thick-Restart Lanczos Method on a Distributed-Memory System.

[BibT_eX]

[DOI]