Kazuhiko Komatsu

Orcid: 0000-0003-4463-8359

According to our database1, Kazuhiko Komatsu authored at least 67 papers between 1987 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
An Asymptotic Parallel Linear Solver and Its Application to Direct Numerical Simulation for Compressible Turbulence.
Proceedings of the Computational Science - ICCS 2024, 2024

File I/O Cache Performance of Supercomputer Fugaku Using an Out-of-Core Direct Numerical Simulation Code of Turbulence.
Proceedings of the Computational Science - ICCS 2024, 2024

2023
Ising-Based Kernel Clustering.
Algorithms, April, 2023

A dynamic parameter tuning method for SpMM parallel execution.
Concurr. Comput. Pract. Exp., 2023

Investigating the Characteristics of Ising Machines.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

A Constraint Partition Method for Combinatorial Optimization Problems.
Proceedings of the 16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2023

Appropriate Graph-Algorithm Selection for Edge Devices Using Machine Learning.
Proceedings of the 16th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2023

I/O Performance Evaluation of a Memory-Saving DNS Code on SX-Aurora TSUBASA.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Multi-scale Loss based Electron Microscopic Image Pair Matching Method.
Proceedings of the International Conference on Machine Learning and Applications, 2023

Performance Evaluation of Tsunami Evacuation Route Planning on Multiple Annealing Machines.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022
A Metadata Prefetching Mechanism for Hybrid Memory Architectures.
IEICE Trans. Electron., 2022

Page-Address Coalescing of Vector Gather Instructions for Efficient Address Translation.
Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

A Partitioned Memory Architecture with Prefetching for Efficient Video Encoders.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

Analysis of Precision Vectors for Ising-Based Linear Regression.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

High-Performance GraphBLAS Backend Prototype for NEC SX-Aurora TSUBASA.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

2021
VGL: a high-performance graph processing framework for the NEC SX-Aurora TSUBASA vector architecture.
J. Supercomput., 2021

Optimizing Load Balance in a Parallel CFD Code for a Large-scale Turbine Simulation on a Vector Supercomputer.
Supercomput. Front. Innov., 2021

Performance and Power Analysis of a Vector Computing System.
Supercomput. Front. Innov., 2021

Distributed Graph Algorithms for Multiple Vector Engines of NEC SX-Aurora TSUBASA Systems.
Supercomput. Front. Innov., 2021

Efficient Mixed-Precision Tall-and-Skinny Matrix-Matrix Multiplication for GPUs.
Int. J. Netw. Comput., 2021

An External Definition of the One-Hot Constraint and Fast QUBO Generation for High-Performance Combinatorial Clustering.
Int. J. Netw. Comput., 2021

Register Flush-free Runahead Execution for Modern Vector Processors.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

Optimizations of a Linear Matrix Solver in a Composite Simulation for a Vector Computer.
Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Ising-Based Combinatorial Clustering Using the Kernel Method.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

An Externally-Constrained Ising Clustering Method for Material Informatics.
Proceedings of the Ninth International Symposium on Computing and Networking, 2021

2020
Xevolver: A code transformation framework for separation of system-awareness from application codes.
Concurr. Comput. Pract. Exp., 2020

A Dynamic Parameter Tuning Method for High Performance SpMM.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2020

A Deep Reinforcement Learning Based Feature Selector.
Proceedings of the Parallel Architectures, Algorithms and Programming, 2020

I/O Performance of the SX-Aurora TSUBASA.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Importance of Selecting Data Layouts in the Tsunami Simulation Code.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Workshop 14: iWAPT Automatic Performance Tuning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

An Efficient Skinny Matrix-Matrix Multiplication Method by Folding Input Matrices into Tensor Core Operations.
Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Combinatorial Clustering Based on an Externally-Defined One-Hot Constraint.
Proceedings of the Eighth International Symposium on Computing and Networking, 2020

Energy-efficient Design of an STT-RAM-based Hybrid Cache Architecture.
Proceedings of the 2020 IEEE Symposium in Low-Power and High-Speed Chips, 2020

Optimization of the Himeno Benchmark for SX-Aurora TSUBASA.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2020

2019
Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines.
Supercomput. Front. Innov., 2019

A Skewed Multi-banked Cache for Many-core Vector Processors.
Supercomput. Front. Innov., 2019

Optimizing Memory Layout of Hyperplane Ordering for Vector Supercomputer SX-Aurora TSUBASA.
Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

A Hardware Prefetching Mechanism for Vector Gather Instructions.
Proceedings of the 9th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2019

Analysis of Relationship Between SIMD-Processing Features Used in NVIDIA GPUs and NEC SX-Aurora TSUBASA Vector Processors.
Proceedings of the Parallel Computing Technologies, 2019

An Appropriate Computing System and Its System Parameters Selection Based on Bottleneck Prediction of Applications.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Performance Evaluation of Tsunami Inundation Simulation on SX-Aurora TSUBASA.
Proceedings of the Computational Science - ICCS 2019, 2019

Perceptron-based Cache Bypassing for Way-Adaptable Caches.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

2018
Developing Efficient Implementations of Bellman-Ford and Forward-Backward Graph Algorithms for NEC SX-ACE.
Supercomput. Front. Innov., 2018

Performance evaluation of a vector supercomputer SX-aurora TSUBASA.
Proceedings of the International Conference for High Performance Computing, 2018

Search Space Reduction for Parameter Tuning of a Tsunami Simulation on the Intel Knights Landing Processor.
Proceedings of the 12th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2018

Use of Code Structural Features for Machine Learning to Predict Effective Optimizations.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017
Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE.
J. Supercomput., 2017

A Directive Generation Approach to High Code-Maintainability for Various HPC Systems.
Int. J. Netw. Comput., 2017

An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Designing an Open Database of System-Aware Code Optimizations.
Proceedings of the Fifth International Symposium on Computing and Networking, 2017

A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Vectorization-Aware Loop Optimization with User-Defined Code Transformations.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework.
Int. J. Netw. Comput., 2016

A Directive Generation Approach Using User-Defined Rules.
Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015
Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework.
Proceedings of the Third International Symposium on Computing and Networking, 2015

An energy-efficient dynamic memory address mapping mechanism.
Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

2014
A Compiler-Assisted OpenMP Migration Method Based on Automatic Parallelizing Information.
Proceedings of the Supercomputing - 29th International Conference, 2014

2011
A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
A Fast Ray-Tracing Using Bounding Spheres and Frustum Rays for Dynamic Scene Rendering.
IEICE Trans. Inf. Syst., 2010

Automatic Tuning of CUDA Execution Parameters for Stencil Processing.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
CheCUDA: A Checkpoint/Restart Tool for CUDA Applications.
Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

2006
Ray Tracing Hardware System Using Plane-Sphere Intersections.
Proceedings of the 2006 International Conference on Field Programmable Logic and Applications (FPL), 2006

1987
The Outline Procedure in Pattern Data Preparation for Vector-Scan Electron-Beam Lithography.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1987


  Loading...