Daichi Mukunoki

Orcid: 0000-0002-0051-6811

According to our database1, Daichi Mukunoki authored at least 24 papers between 2010 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Reduced-Precision and Reduced-Exponent Formats for Accelerating Adaptive Precision Sparse Matrix-Vector Product.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

2023
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor.
Proceedings of the 16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2023

2022
Infinite-Precision Inner Product and Sparse Matrix-Vector Multiplication Using Ozaki Scheme with Dot2 on Manycore Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

2021
Task Scheduling Strategies for Batched Basic Linear Algebra Subprograms on Many-core CPUs.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws?
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow.
Proceedings of the Computational Science and Its Applications - ICCSA 2021, 2021

Conjugate Gradient Solvers with High Accuracy and Bit-wise Reproducibility between CPU and GPU using Ozaki scheme.
Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021

2020
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs.
J. Comput. Appl. Math., 2020

White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
CoRR, 2020

Can We Avoid Rounding-Error Estimation in HPC Codes and Still Get Trustworthy Results?
Proceedings of the Software Verification - 12th International Conference, 2020

DGEMM Using Tensor Cores, and Its Accurate and Reproducible Versions.
Proceedings of the High Performance Computing - 35th International Conference, 2020

2019
Reproducible BLAS Routines with Tunable Accuracy Using Ozaki Scheme for Many-Core Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Design of an FPGA-Based Matrix Multiplier with Task Parallelism.
Proceedings of the Parallel Computing: Technology Trends, 2019

2018
Performance Analysis of 2D-compatible 2.5D-PDGEMM on Knights Landing Cluster.
Proceedings of the Computational Science - ICCS 2018, 2018

2017
Implementation and Performance Analysis of 2.5D-PDGEMM on the K Computer.
Proceedings of the Parallel Processing and Applied Mathematics, 2017

Design Towards Modern High Performance Numerical LA Library Enabling Heterogeneity and Flexible Data Formats.
Proceedings of the Parallel Computing is Everywhere, 2017

2016
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs.
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

Reduced-Precision Floating-Point Formats on GPUs for High Performance and Energy Efficient Computation.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

2013
Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs.
Proceedings of the Computational Science and Its Applications - ICCSA 2013, 2013

2012
Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2010
Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs.
Proceedings of the Applied Parallel and Scientific Computing, 2010


  Loading...