Akira Naruse

Orcid: 0000-0002-3140-0854

According to our database1, Akira Naruse authored at least 19 papers between 2002 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Preliminary Performance Evaluation of Grace-Hopper GH200.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
Custom 8-bit floating point value format for reducing shared memory bank conflict in approximate nearest neighbor search.
CoRR, 2023

Parallel Top-K Algorithms on GPU: A Comprehensive Study and New Methods.
Proceedings of the International Conference for High Performance Computing, 2023

2022
Scalable and Practical Natural Gradient for Large-Scale Deep Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

2020
Low-Order Finite Element Solver with Small Matrix-Matrix Multiplication Accelerated by AI-Specific Hardware for Crustal Deformation Computation.
Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020

Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020

2019
GPU Implementation of a Sophisticated Implicit Low-Order Finite Element Solver with FP21-32-64 Computation Using OpenACC.
Proceedings of the Accelerator Programming Using Directives - 6th International Workshop, 2019

Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs.
CoRR, 2018

A fast scalable implicit solver for nonlinear time-evolution earthquake city problem on low-ordered unstructured finite elements with artificial intelligence and transprecision computing.
Proceedings of the International Conference for High Performance Computing, 2018

A Fast Scalable Implicit Solver with Concentrated Computation for Nonlinear Time-Evolution Problems on Low-Order Unstructured Finite Elements.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

2016
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.
J. Comput. Chem., 2016

2013
Interference-aware Incoming Message Detection for MPI Threaded Progression.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Multiplexing aware arbiter physical unclonable function.
Proceedings of the IEEE 13th International Conference on Information Reuse & Integration, 2012

2009
The Design of Seamless MPI Computing Environment for Commodity-Based Clusters.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Adaptive Immune Algorithm Considering Intensification and Diversification.
Proceedings of the IEEE International Conference on Information Reuse and Integration, 2009

2002
Speeding Up Kernel Scheduler by Reducing Cache Misses.
Proceedings of the FREENIX Track: 2002 USENIX Annual Technical Conference, 2002


  Loading...