Toshiyuki Imamura
Orcid: 0000-0003-1601-9710
According to our database1,
Toshiyuki Imamura
authored at least 76 papers
between 2000 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
CoRR, 2024
2023
Proceedings of the 16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2023
A new data conversion method for mixed precision Krylov solvers with FP16/BF16 Jacobi preconditioners.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023
2022
High Performance Parallel LOBPCG Method for Large Hamiltonian Derived from Hubbard Model on Multi-GPU Systems.
Proceedings of the Supercomputing Frontiers - 7th Asian Conference, 2022
GPU Optimization of Lattice Boltzmann Method with Local Ensemble Transform Kalman Filter.
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2022
2021
MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021
Iterative methods with mixed-precision preconditioning for ill-conditioned linear systems in multiphase CFD simulations.
Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2021
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021
Proceedings of the Computational Science and Its Applications - ICCSA 2021, 2021
2020
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
CoRR, 2020
Error Analysis of the Cholesky QR-Based Block Orthogonalization Process for the One-Sided Block Jacobi SVD Algorithm.
Comput. Informatics, 2020
Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results?
Proceedings of the Software Verification - 12th International Conference, 2020
Proceedings of the High Performance Computing - 35th International Conference, 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
A 1024-member ensemble data assimilation with 3.5-km mesh global weather simulations.
Proceedings of the International Conference for High Performance Computing, 2020
Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding krylov method.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the IEEE International Conference on Cluster Computing, 2020
Proceedings of the IEEE International Conference on Cluster Computing, 2020
2019
High Performance Eigenvalue Solver for Hubbard Model: Tuning Strategies for LOBPCG Method on CUDA GPU.
Proceedings of the Parallel Computing: Technology Trends, 2019
Proceedings of the Parallel Computing: Technology Trends, 2019
Proceedings of the Parallel Computing: Technology Trends, 2019
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019
2018
High Performance LOBPCG Method for Solving Multiple Eigenvalues of Hubbard Model: Efficiency of Communication Avoiding Neumann Expansion Preconditioner.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018
Application of a Preconditioned Chebyshev Basis Communication-Avoiding Conjugate Gradient Method to a Multiphase Thermal-Hydraulic CFD Code.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
A Case Study on Modeling the Performance of Dense Matrix Computation: Tridiagonalization in the EigenExa Eigensolver on the K Computer.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Proceedings of the Computational Science - ICCS 2018, 2018
Proceedings of the Poster Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018
2017
Application of a communication-avoiding generalized minimal residual method to a gyrokinetic five dimensional eulerian code on many core platforms.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the Parallel Processing and Applied Mathematics, 2017
Parallel Divide-and-Conquer Algorithm for Solving Tridiagonal Eigenvalue Problems on Manycore Systems.
Proceedings of the Parallel Processing and Applied Mathematics, 2017
Communication Avoiding Neumann Expansion Preconditioner for LOBPCG Method: Convergence Property of Exact Diagonalization Method for Hubbard Model.
Proceedings of the Parallel Computing is Everywhere, 2017
Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats.
Proceedings of the Parallel Computing is Everywhere, 2017
Quadruple-Precision BLAS Using Bailey's Arithmetic with FMA Instruction: Its Performance and Applications.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 24th IEEE International Conference on Electronics, Circuits and Systems, 2017
2016
Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations.
Comput. Phys. Commun., 2016
Left-Preconditioned Communication-Avoiding Conjugate Gradient Methods for Multiphase CFD Simulations on the K Computer.
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2016
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016
Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
2015
Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer.
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015
High Performance Eigenvalue Solver in Exact-diagonalization Method for Hubbard Model on CUDA GPU.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Proceedings of the Parallel Computing: On the Road to Exascale, 2015
Performance Evaluation of the Eigen Exa Eigensolver on Oakleaf-FX: Tridiagonalization Versus Pentadiagonalization.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
2014
Implementation of d-Spline-based incremental performance parameter estimation method with ppOpen-AT.
Sci. Program., 2014
Communication-overlap techniques for improved strong scaling of gyrokinetic Eulerian code beyond 100k cores on the K-computer.
Int. J. High Perform. Comput. Appl., 2014
Performance Analysis of the Householder-Type Parallel Tall-Skinny QR Factorizations Toward Automatic Algorithm Selection.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
A Study of Parallel Data Compression Using Proper Orthogonal Decomposition on the K Computer.
Proceedings of the 14th Eurographics Symposium on Parallel Graphics and Visualization, 2014
2013
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Parallel Computing Design for Exact Diagonalization Scheme on Multi-band Hubbard Cluster Models.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013
Proper orthogonal decomposition based parallel compression for visualizing big data on the K computer.
Proceedings of the IEEE Symposium on Large-Scale Data Analysis and Visualization, 2013
2012
Proceedings of the High Performance Computing for Computational Science, 2012
Poster: Preliminary Report for a High Precision Distributed Memory Parallel Eigenvalue Solver.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: Preliminary Report for a High Precision Distributed Memory Parallel Eigenvalue Solver.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Poster: Communication Overlap Techniques for Improved Strong Scaling of Gyrokinetic Eulerian Code beyond 100k Cores on the K-Computer.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: Communication Overlap Techniques for Improved Strong Scaling of Gyrokinetic Eulerian Code beyond 100k Cores on the K-Computer.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
2011
Parallelization design on multi-core platforms in density matrix renormalization group toward 2-D quantum strongly-correlated systems.
Proceedings of the Conference on High Performance Computing Networking, 2011
2010
High-Performance Quantum Simulation for Coupled Josephson Junctions on the Earth Simulator: a Challenge To the Schrödinger Equation On 256<sup>4</sup> Grids.
Int. J. High Perform. Comput. Appl., 2010
2009
Narrow-band reduction approach of a DRSM eigensolver on a multicore-based cluster system.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009
2007
Recursive multi-factoring algorithm for MPI allreduce.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2007
2006
Gordon Bell finalists I - High-performance computing for exact numerical approaches to quantum many-body problems on the earth simulator.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
2005
16.447 TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005
10TFLOPS Eigenvalue Solver for Strongly-Correlated Fermions on the Earth Simulator.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2005
C-Stab: Cache Stabilizing Algorithm for a Numerical Library.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2005
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005
Automatic Tuning Technique Exploring Within the Hardware-Specific Constrained Parameters.
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005
16.14 TFLOPS Eigenvalue Solver on the Earth Simulator: Exact Diagonalization for Ultra Largescale Hamiltonian Matrix.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005
2003
Proceedings of the Parallel and Distributed Processing and Applications, 2003
A Visual Resource Integration Environment for Distributed Applications on the ITBL System.
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
2002
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002
2000
An Estimation of Complexity and Computational Costs for Vertical Block-Cyclic Distributed Parallel LU Factorization.
J. Supercomput., 2000
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000