Toshihiro Hanawa

Orcid: 0000-0002-2970-6037

According to our database1, Toshihiro Hanawa authored at least 67 papers between 1994 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MPI-Adapter2: An Automatic ABI Translation Library Builder for MPI Application Binary Portability.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

Optimize Efficiency of Utilizing Systems by Dynamic Core Binding.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

Power-Efficiency Variation on A64FX Supercomputers and its Application to System Operation.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Preliminary Performance Evaluation of Grace-Hopper GH200.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
Dynamic Core Binding for Load Balancing of Applications Parallelized with MPI/OpenMP.
Proceedings of the Computational Science - ICCS 2023, 2023

2022
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.
CoRR, 2022

A System-Wide Communication to Couple Multiple MPI Programs for Heterogeneous Computing.
Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2022

A Process Management Runtime with Dynamic Reconfiguration.
Proceedings of the HPCAsia 2022 Workshop: International Conference on High Performance Computing in Asia-Pacific Region Workshops, Virtual Event Japan, January 11, 2022

Offloading Transprecision Calculation Using FPGA.
Proceedings of the HPCAsia 2022 Workshop: International Conference on High Performance Computing in Asia-Pacific Region Workshops, Virtual Event Japan, January 11, 2022


Optimizations of H-matrix-vector Multiplication for Modern Multi-core Processors.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Automatic Graph Partitioning for Very Large-scale Deep Learning.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

2020
Development of training environment for deep learning with medical images on supercomputer system based on asynchronous parallel Bayesian optimization.
J. Supercomput., 2020

Footprint-Aware Power Capping for Hybrid Memory Based Systems.
Proceedings of the High Performance Computing - 35th International Conference, 2020

Distribution System for Japanese Synthetic Population Data with Protection Level.
Proceedings of the International Conference on Machine Learning and Cybernetics, 2020

Analysis of Cooling Water Temperature Impact on Computing Performance and Energy Consumption.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster.
Int. J. High Perform. Comput. Appl., 2019

2018
10-Gbps Real-time Burst-Frame Synchronization Using Dual-Stage Detection for Full-Software Optical Access Systems.
Proceedings of the Optical Fiber Communications Conference and Exposition, 2018

Coherent Receiver DSP Implemented on a General-Purpose Server for Full Software-Defined Optical Access.
Proceedings of the Optical Fiber Communications Conference and Exposition, 2018

Design of Parallel BEM Analyses Framework for SIMD Processors.
Proceedings of the Computational Science - ICCS 2018, 2018

Scaling collectives on large clusters using Intel(R) architecture processors and fabric.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Load-Balancing-Aware Parallel Algorithms of H-Matrices with Adaptive Cross Approximation for GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017
Communication-Computation Overlapping with Dynamic Loop Scheduling for Preconditioned Parallel Iterative Solvers on Multicore and Manycore Clusters.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Performance Evaluation of PEACH3: Field-Programmable Gate Array Switch for Tightly Coupled Accelerators.
Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2017

2016
Implementation and Evaluation of NAS Parallel CG Benchmark on GPU Cluster with Proprietary Interconnect TCA.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

2015
Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Reduction calculator in an FPGA based switching Hub for high performance clusters.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

Parallelization of cipher algorithm on CPU/GPU for real-time software-defined access network.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators.
SIGARCH Comput. Archit. News, 2014

XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

A Preliminarily Evaluation of PEACH3: A Switching Hub for Tightly Coupled Accelerators.
Proceedings of the Second International Symposium on Computing and Networking, 2014

QCD Library for GPU Cluster with Proprietary Interconnect for GPU Direct Communication.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Interconnection Network for Tightly Coupled Accelerators Architecture.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Task level pipelining with PEACH2: An FPGA switching fabric for high performance computing.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013

2012
GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

DS-Bench Toolset: Tools for dependability benchmarking with simulation and assurance.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011
Peach: A Multicore Communication System on Chip with PCI Express.
IEEE Micro, 2011

An 80Gb/s dependable communication SoC with PCI express I/F and 8 CPUs.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

PEARL and PEACH: A Novel PCI Express Direct Link and Its Implementation.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

XMCAPI: Inter-core Communication Interface on Multi-chip Embedded Systems.
Proceedings of the IEEE/IFIP 9th International Conference on Embedded and Ubiquitous Computing, 2011

An 80 Gbps dependable multicore communication SoC with PCI express I/F and intelligent interrupt controller.
Proceedings of the 2011 IEEE Symposium on Low-Power and High-Speed Chips, 2011

2010
Customizing Virtual Machine with Fault Injector by Integrating with SpecC Device Model for a Software Testing Environment D-Cloud.
Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing, 2010

Large-Scale Software Testing Environment Using Cloud Computing Technology for Dependable Parallel and Distributed Systems.
Proceedings of the Third International Conference on Software Testing, 2010

PEARL: Power-Aware, Dependable, and High-Performance Communication Link Using PCI Express.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Towards an Open Dependable Operating System.
Proceedings of the 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2009

RI2N/DRV: Multi-link ethernet for high-bandwidth and fault-tolerant network on PC clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Flexible Multi-link Ethernet Binding System for PC Clusters with Asymmetric Topology.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008
A dynamic routing control system for high-performance PC cluster with multi-path Ethernet connection.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

RI2N: High-bandwidth and fault-tolerant network with multi-link Ethernet for PC clusters.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2005
The performance of SNAIL-2 (a SSS-MIN connected multiprocessor with cache coherent mechanism).
Parallel Comput., 2005

Implementation of ISIS-SimpleScalar.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2005

2003
Performance Evaluation of 3-Dimensional MIN with Cache Consistency Maintenance Mechanism.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2003

1999
Performance evaluation of SNAIL: A multiprocessor based on the simple serial synchronized multistage interconnection network architecture.
Parallel Comput., 1999

1998
The MINC (Multistage Interconnection Network with Cache Control Mechanism) Chip.
Proceedings of the ASP-DAC '98, 1998

1997
Adaptive Routing on the Recursive Diagonal Torus.
Proceedings of the High Performance Computing, International Symposium, 1997

1996
Hot spot contention and message combining in the simple serial synchronized multistage interconnection network.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

1995
An analysis of the hot spot contention and message combining on the simple serial synchronized-multistage interconnection network.
Syst. Comput. Jpn., 1995

1994
Multistage Interconnection Networks with Multiple Outlets.
Proceedings of the 1994 International Conference on Parallel Processing, 1994


  Loading...