Rio Yokota
Orcid: 0000-0001-7573-7873
According to our database1,
Rio Yokota
authored at least 90 papers
between 2007 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
On csauthors.net:
Bibliography
2024
IEEE Trans. Signal Inf. Process. over Networks, 2024
Int. J. High Perform. Comput. Appl., 2024
An inherently parallel ℋ<sup>2</sup>-ULV factorization for solving dense linear systems on GPUs.
Int. J. High Perform. Comput. Appl., 2024
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs.
CoRR, 2024
Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities.
CoRR, 2024
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order.
CoRR, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024
Proceedings of the Computer Vision - ECCV 2024, 2024
2023
Cache Optimization and Performance Modeling of Batched, Small, and Rectangular Matrix Multiplication on Intel, AMD, and Fujitsu Processors.
ACM Trans. Math. Softw., September, 2023
Trans. Mach. Learn. Res., 2023
Trans. Mach. Learn. Res., 2023
The 2023 Society for Industrial and Applied Mathematics Conference on Computational Science and Engineering.
Comput. Sci. Eng., 2023
O(N) distributed direct factorization of structured dense matrices using runtime systems.
CoRR, 2023
Quantum Circuit Simulation by SGEMM Emulation on Tensor Cores and Automatic Precision Selection.
Proceedings of the High Performance Computing - 38th International Conference, 2023
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Proceedings of the Platform for Advanced Scientific Computing Conference, 2023
O(N) distributed direct factorization of structured dense matrices using runtime systems.
Proceedings of the 52nd International Conference on Parallel Processing, 2023
Proceedings of the 52nd International Conference on Parallel Processing, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023
Towards real-time formula driven dataset feed for large scale deep learning training.
Proceedings of the High Performance Computing for Imaging 2023, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
2022
IEEE Trans. Pattern Anal. Mach. Intell., 2022
Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance.
Int. J. High Perform. Comput. Appl., 2022
Scalable Linear Time Dense Direct Solver for 3-D Problems without Trailing Sub-Matrix Dependencies.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022
Proceedings of the 4th ACM International Conference on Multimedia in Asia, 2022
OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching.
Proceedings of the 2022 International Conference on Robotics and Automation, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
ExaFMM: a high-performance fast multipole method library with C++ and Python interfaces.
J. Open Source Softw., 2021
CoRR, 2021
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
2020
Rich Information is Affordable: A Systematic Performance Analysis of Second-order Optimization Using K-FAC.
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
Effect of Mixed Precision Computing on H-Matrix Vector Multiplication in BEM Analysis.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020
2019
SIAM J. Sci. Comput., 2019
J. Inf. Process., 2019
Int. J. High Perform. Comput. Appl., 2019
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Optimization of Numerous Small Dense-Matrix-Vector Multiplications in H-Matrix Arithmetic on GPU.
Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019
Performance Optimizations and Analysis of Distributed Deep Learning with Approximated Second-Order Optimization Method.
Proceedings of the 48th International Conference on Parallel Processing, 2019
Large-Scale Distributed Second-Order Optimization Using Kronecker-Factored Approximate Curvature for Deep Convolutional Neural Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
2018
Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs.
CoRR, 2018
Comput. Vis. Sci., 2018
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
2017
Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions.
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017
Evaluating the Compression Efficiency of the Filters in Convolutional Neural Networks.
Proceedings of the Artificial Neural Networks and Machine Learning - ICANN 2017, 2017
Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
2016
A performance model for the communication in fast multipole methods on high-performance computing platforms.
Int. J. High Perform. Comput. Appl., 2016
CoRR, 2016
A Matrix-free Preconditioner for the Helmholtz Equation based on the Fast Multipole Method.
CoRR, 2016
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
2014
Supercomput. Front. Innov., 2014
Petascale molecular dynamics simulation using the fast multipole method on K computer.
Comput. Phys. Commun., 2014
A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms.
CoRR, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
2013
Petascale turbulence simulation using a highly parallel fast multipole method on GPUs.
Comput. Phys. Commun., 2013
Fork-Join and Data-Driven Execution Models on Multi-core Architectures: Case Study of the FMM.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
2012
A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems.
Int. J. High Perform. Comput. Appl., 2012
Comput. Sci. Eng., 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 11th International Symposium on Parallel and Distributed Computing, 2012
2011
Biomolecular electrostatics using a fast multipole BEM on up to 512 gpus and a billion unknowns.
Comput. Phys. Commun., 2011
Fast Multipole Method vs. Spectral Method for the Simulation of Isotropic Turbulence on GPUs
CoRR, 2011
CoRR, 2011
2010
2009
Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence.
Comput. Phys. Commun., 2009
42 TFlops hierarchical <i>N</i>-body simulations on GPUs with applications in both astrophysics and turbulence.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
2007
J. Comput. Phys., 2007