Yuetsu Kodama

Orcid: 0000-0001-5787-0363

According to our database1, Yuetsu Kodama authored at least 75 papers between 1989 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads.
ACM Trans. Archit. Code Optim., December, 2023

Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks.
IEICE Trans. Electron., June, 2023

2022
Co-Design and System for the Supercomputer "Fugaku".
IEEE Micro, 2022

At the Locus of Performance: A Case Study in Enhancing CPUs with Copious 3D-Stacked Cache.
CoRR, 2022

2021
Performance and power consumption analysis of Arm Scalable Vector Extension.
J. Supercomput., 2021

Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Power/Performance/Area Evaluations for Next-Generation HPC Processors using the A64FX Chip.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2021

Evaluation of SPEC CPU and SPEC OMP on the A64FX.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
The gem5 Simulator: Version 20.0+.
CoRR, 2020

Co-design for A64FX manycore processor and "Fugaku".
Proceedings of the International Conference for High Performance Computing, 2020

Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Evaluation of Power Management Control on the Supercomputer Fugaku.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Evaluation of the RIKEN Post-K Processor Simulator.
CoRR, 2019

2018
Power performance analysis of ARM scalable vector extension.
Proceedings of the 2018 IEEE Symposium in Low-Power and High-Speed Chips, 2018

2017
Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2015
Implementation of CG Method on GPU Cluster with Proprietary Interconnect TCA for GPU Direct Communication.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators.
SIGARCH Comput. Archit. News, 2014

XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

A Preliminarily Evaluation of PEACH3: A Switching Hub for Tightly Coupled Accelerators.
Proceedings of the Second International Symposium on Computing and Networking, 2014

QCD Library for GPU Cluster with Proprietary Interconnect for GPU Direct Communication.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
Imbalance of CPU temperatures in a blade system and its impact for power consumption of fans.
Clust. Comput., 2013

Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

The Flexible Sound Synthesizer on an FPGA.
Proceedings of the First International Symposium on Computing and Networking, 2013

Interconnection Network for Tightly Coupled Accelerators Architecture.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

The study of three-dimensional multiphase-flow simulator.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

2011
High-Resolution Timer-Based Packet Pacing Mechanism on the Linux Operating System.
IEICE Trans. Commun., 2011

2010
Power consumption and efficiency of cooling in a Data Center.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010

Power Reduction Scheme of Fans in a Blade System by Considering the Imbalance of CPU Temperatures.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

2009
Metroflux: A high performance system for analysing flow at very fine-grain.
Proceedings of the 5th International ICST Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities, 2009

2008
The design and implementation of MPI collective operations for clusters in long-and-fast networks.
Clust. Comput., 2008

High Performance Relay Mechanism for MPI Communication Libraries Run on Multiple Private IP Address Clusters.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Dependable communication using multiple network paths on fast long-distance networks.
Syst. Comput. Jpn., 2007

Effects of packet pacing for MPI programs in a Grid environment.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Efficient MPI Collective Operations for Clusters in Long-and-Fast Networks.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
TCP Adaptation for MPI on Long-and-Fat Networks.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
The Second Trans-Pacific Grid Datafarm Testbed and Experiments for SC2003.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

GNET-1: gigabit Ethernet network testbed.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
Design and implementation of PVFS-PM: a cluster file system on SCore.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2001
Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded Architecture.
Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

1999
Communication Studies of Single-Threaded and Multithreaded Distributed-Memory Multiprocessors.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
Fast Speculative Search Engine on the Highly Parallel Computer EM-X.
Proceedings of the SIGIR '98: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1998

Highly Efficient Implementation of MPI Point-to-Point Communication Using Remote Memory Operations.
Proceedings of the 12th international conference on Supercomputing, 1998

Load Balanced Parallel Radix Sort.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
Fine-Grain Multithreading with the EM-X Multiprocessor.
Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997

Experience with Fine-Grain Communication in EM-X Multiprocessor for Parallel Sparse Matrix Computation.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Parallel Execution of Radix Sort Program Using Fine-Grain Communication.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996
Identifying the capability of overlapping computation with communication.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Reduced Interprocessor-Communication Architecture and its Implementation on EM-4.
Parallel Comput., 1995

The EM-X Parallel Computer: Architecture and Basic Performance.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A Macrotask-level Unlimited Speculative Execution on Multiprocessors.
Proceedings of the 9th international conference on Supercomputing, 1995

1994
Programming with Distributed Data Structure for EM-X Multiprocessor.
Proceedings of the Theory and Practice of Parallel Programming, 1994

Parallel bidirectional heuristic search on the EM-4 multiprocessor.
Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994

Nonnumeric search results on the EM-4 distributed-memory multiprocessor.
Proceedings of the Proceedings Supercomputing '94, 1994

Message-based efficient remote memory access on a highly parallel computer EM-X.
Proceedings of the International Symposium on Parallel Architectures, 1994

Experience with Executing Shared Memory Programs using Fine-Grain Communication and Multithreading in EM-4.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

EM-C: Programming with Explicit Parallelism and Locality for EM-4 Multiprocessor.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

1993
Evaluation of parallel execution performance by highly parallel computer EM-4.
Syst. Comput. Jpn., 1993

Design and Implementation of a Circular Omega Network in the EM-4.
Parallel Comput., 1993

RICA: Reduced Interprocessor-Communication Architecture - Concept and Mechanisms.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

Super-Threading: Architectural and Software Mechanisms for Optimizing Parallel Computation.
Proceedings of the 7th international conference on Supercomputing, 1993

EMC-Y: Parallel Processing Element Optimizing Communication and Computation.
Proceedings of the 7th international conference on Supercomputing, 1993

1992
Methodologies in development and testing of the dataflow machine EM-4.
Parallel Comput., 1992

A prototype of a highly parallel dataflow machine EM-4 and its preliminary evaluation.
Future Gener. Comput. Syst., 1992

Thread-based Programming for the EM-4 Hybrid Dataflow Machine.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Evaluation of the EM-4 Highly Parallel Computer using a Game Tree Searching Problem.
Proceedings of the International Conference on Fifth Generation Computer Systems. FGCS 1992, 1992

1991
Load balancing by function distribution on the EM-4 prototype.
Proceedings of the Proceedings Supercomputing '91, 1991

Prototype Implementation of a Highly Parallel Dataflow Machine EM-4.
Proceedings of the Fifth International Parallel Processing Symposium, Proceedings, Anaheim, California, USA, April 30, 1991

Design and Implementation of a Versatile Interconnection Network in the EM-4.
Proceedings of the International Conference on Parallel Processing, 1991

1989
An Architecture of a Dataflow Single Chip Processor.
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

An Architectural Disgn of a Highly Parallel Dataflow Machine.
Proceedings of the Information Processing 89, Proceedings of the IFIP 11th World Computer Congress, San Francisco, USA, August 28, 1989


  Loading...