Mitsuhisa Sato

Orcid: 0000-0003-0543-7116

According to our database1, Mitsuhisa Sato authored at least 257 papers between 1987 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Correction: Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication.
J. Supercomput., November, 2024

Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication.
J. Supercomput., September, 2024

Large-scale and cooperative graybox parallel optimization on the supercomputer Fugaku.
J. Parallel Distributed Comput., 2024

Quantum-centric supercomputing for materials science: A perspective on challenges and future directions.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Future Gener. Comput. Syst., 2024

Massively parallel CMA-ES with increasing population.
CoRR, 2024

Revolutionizing MRI Data Processing Using FSL: Preliminary Findings with the Fugaku Supercomputer.
CoRR, 2024

CORTEX: Large-Scale Brain Simulator Utilizing Indegree Sub-Graph Decomposition on Fugaku Supercomputer.
CoRR, 2024

Enhancing the Parallel UC2B Framework: Approach Validation and Scalability Study.
Proceedings of the Computational Science - ICCS 2024, 2024

Advancements in Traffic Simulations with multiMATSim's Distributed Framework.
Proceedings of the 16th International Conference on Agents and Artificial Intelligence, 2024

An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

Enhancing Large Scale Brain Simulation with Optimized Parallel Algorithms on Fugaku Supercomputer.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC Benchmarks.
IEICE Trans. Electron., June, 2023

OpenACC Unified Programming Environment for Multi-hybrid Acceleration with GPU and FPGA.
Proceedings of the High Performance Computing, 2023

OpenACC Execution Models for Manycore Processor with ARM SVE.
Proceedings of the HPC Asia 2023 Workshops, 2023

Performance improvement by enhancing spatial parallelism on FPGA for HPC applications.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
Co-Design and System for the Supercomputer "Fugaku".
IEEE Micro, 2022

The Supercomputer "Fugaku".
Proceedings of the 2022 International Symposium on VLSI Design, Automation and Test, 2022

Pushing the Frontier in the Design of Laser-Based Electron Accelerators with Groundbreaking Mesh-Refined Particle-In-Cell Simulations on Exascale-Class Supercomputers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Design and Performance Evaluation of UCX for Tofu-D Interconnect with OpenSHMEM-UCX on Fugaku.
Proceedings of the IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X, 2022

Scaling the PageRank Algorithm for Very Large Graphs on the Fugaku Supercomputer.
Proceedings of the Computational Science - ICCS 2022, 2022

Performance tuning of the Helmholtz matrix-vector product kernel in the computational fluid dynamics solver Nek5000/RS for the A64FX processor.
Proceedings of the HPCAsia 2022 Workshop: International Conference on High Performance Computing in Asia-Pacific Region Workshops, Virtual Event Japan, January 11, 2022

Performance analysis of a state vector quantum circuit simulation on A64FX processor.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
A new sustained system performance metric for scientific performance evaluation.
J. Supercomput., 2021

Performance and power consumption analysis of Arm Scalable Vector Extension.
J. Supercomput., 2021

Graph optimization algorithm for low-latency interconnection networks.
Parallel Comput., 2021

Performance of the Supercomputer Fugaku for Breadth-First Search in Graph500 Benchmark.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Power/Performance/Area Evaluations for Next-Generation HPC Processors using the A64FX Chip.
Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2021

Performance Evaluation and Analysis of A64FX many-core Processor for the Fiber Miniapp Suite.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Evaluation of SPEC CPU and SPEC OMP on the A64FX.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Sequences of Sparse Matrix-Vector Multiplication on Fugaku's A64FX processors.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Multi-SPMD Programming Model with YML and XcalableMP.
Proceedings of the XcalableMP PGAS Programming Language, 2021

Hybrid-View Programming of Nuclear Fusion Simulation Code in XcalableMP.
Proceedings of the XcalableMP PGAS Programming Language, 2021

XcalableMP 2.0 and Future Directions.
Proceedings of the XcalableMP PGAS Programming Language, 2021

XcalableMP Programming Model and Language.
Proceedings of the XcalableMP PGAS Programming Language, 2021

2020
InKS: a programming model to decouple algorithm from optimization in HPC codes.
J. Supercomput., 2020

Design and evaluation of efficient global data movement in partitioned global address space.
Parallel Comput., 2020

Co-design for A64FX manycore processor and "Fugaku".
Proceedings of the International Conference for High Performance Computing, 2020

The Supercomputer "Fugaku" and Arm-SVE enabled A64FX processor for energy-efficiency and sustained application performance.
Proceedings of the 19th International Symposium on Parallel and Distributed Computing, 2020

Parallelization of All-Pairs-Shortest-Path Algorithms in Unweighted Graph.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Accuracy Improvement of Memory System Simulation for Modern Shared Memory Processor.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Preliminary Performance Evaluation of the Fujitsu A64FX Using HPC Applications.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Performance Evaluation of Supercomputer Fugaku using Breadth-First Search Benchmark in Graph500.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Evaluation of Power Management Control on the Supercomputer Fugaku.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster.
Int. J. High Perform. Comput. Appl., 2019

Implementation and evaluation of the HPC challenge benchmark in the XcalableMP PGAS language.
Int. J. High Perform. Comput. Appl., 2019

Evaluation of the RIKEN Post-K Processor Simulator.
CoRR, 2019

OpenMP Task Generation for Batched Kernel APIs.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Scalable communication performance prediction using auto-generated pseudo MPI event trace.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

A Method for Order/Degree Problem Based on Graph Symmetry and Simulated Annealing with MPI/OpenMP Parallelization.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Multi-accelerator extension in OpenMP based on PGAS model.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Distributed and Parallel Programming Paradigms on the K computer and a Cluster.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

SCore.
Proceedings of the Operating Systems for Supercomputers and High Performance Computing, 2019

2018
Corrigendum: Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers.
Frontiers Neuroinformatics, 2018

Extremely Scalable Spiking Neuronal Network Simulation Code: From Laptops to Exascale Computers.
Frontiers Neuroinformatics, 2018

MACC: An OpenACC Transpiler for Automatic Multi-GPU Use.
Proceedings of the Supercomputing Frontiers - 4th Asian Conference, 2018

Design of Data Management for Multi SPMD Workflow Programming Model.
Proceedings of the 4th International Workshop on Extreme Scale Programming Models and Middleware, 2018

Trade-Off of Offloading to FPGA in OpenMP Task-Based Programming.
Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

The Impact of Taskyield on the Design of Tasks Communicating Through MPI.
Proceedings of the Evolving OpenMP for Evolving Architectures, 2018

Metaprogramming Framework for Existing HPC Languages Based on the Omni Compiler Infrastructure.
Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Multi-tasking Execution in PGAS Language XcalableMP and Communication Optimization on Many-core Clusters.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

Performance evaluation for a hydrodynamics application in XcalableACC PGAS language for accelerated clusters.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Performance evaluation for omni XcalableMP compiler on many-core cluster system based on knights landing.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

Linkage of XcalableMP and Python languages for high productivity on HPC cluster system: application to graph order/degree problem.
Proceedings of the Proceedings of Workshops of HPC Asia 2018, 2018

A Source-to-Source Translation of Coarray Fortran with MPI for High Performance.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

High-productivity Programming and Optimization Framework for Stream Processing on FPGA.
Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2018

InKS, a Programming Model to Decouple Performance from Algorithm in HPC Codes.
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

Power performance analysis of ARM scalable vector extension.
Proceedings of the 2018 IEEE Symposium in Low-Power and High-Speed Chips, 2018

2017
Preliminary Performance Evaluation of Coarray-based Implementation of Fiber Miniapp Suite using XcalableMP PGAS Language.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

Extending OpenMP SIMD Support for Target Specific Code and Application to ARM SVE.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

A Performance Projection of Mini-Applications onto Benchmarks Toward the Performance Projection of Real-Applications.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Implementing Lattice QCD Application with XcalableACC Language on Accelerated Cluster.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Implementation and Evaluation of One-sided PGAS Communication in XcalableACC for Accelerated Clusters.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP.
Parallel Comput., 2016

Design and Preliminary Evaluation of Omni OpenACC Compiler for Massive MIMD Processor PEZY-SC.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

OpenMP Extension for Explicit Task Allocation on NUMA Architecture.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

2015
Fault tolerance features of a new multi-SPMD programming/execution environment.
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015

Hybrid Communication with TCA and InfiniBand on a Parallel Programming Language XcalableACC for GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Towards Unification of Accelerated Computing and Interconnection For Extreme-Scale Computing.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

2014
PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators.
SIGARCH Comput. Archit. News, 2014

Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
J. Comput. Chem., 2014

XcalableACC: extension of XcalableMP PGAS language using OpenACC for accelerator clusters.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

Victim Selection and Distributed Work Stealing Performance: A Case Study.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Hybrid-view programming of nuclear fusion simulation code in the PGAS parallel programming language XcalableMP.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Grid-Oriented Process Clustering System for Partial Message Logging.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

A Design of a Communication Library between Multiple Sets of MPI Processes for MPMD.
Proceedings of the 17th IEEE International Conference on Computational Science and Engineering, 2014

A PGAS Execution Model for Efficient Stencil Computation on Many-Core Processors.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

2013
A communication library between multiple sets of MPI processes for a MPMD model.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

XMP-IO function and its application to MapReduce on the K computer.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Model Checking Stencil Computations Written in a Partitioned Global Address Space Language.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Multiple-SPMD Programming Environment Based on PGAS and Workflow toward Post-petascale Computing.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Adaptive Task Size Control on High Level Programming for GPU/CPU Work Sharing.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Interconnection Network for Tightly Coupled Accelerators Architecture.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

A Source-to-Source OpenACC Compiler for CUDA.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

2012
Preface.
J. Supercomput., 2012

Audit: A new synchronization API for the GET/PUT protocol.
J. Parallel Distributed Comput., 2012

Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing.
IEICE Trans. Inf. Syst., 2012

Auto-tuning of Numerical Programs by Block Multi-color Ordering Code Generation and Job-Level Parallel Execution.
Proceedings of the High Performance Computing for Computational Science, 2012

Implementation of XcalableMP Device Acceleration Extention with OpenCL.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

On-the-Fly Synchronization Checking for Interactive Programming in XcalableMP.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

DS-Bench Toolset: Tools for dependability benchmarking with simulation and assurance.
Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

An asynchronous parallel genetic algorithm for the maximum likelihood phylogenetic tree search.
Proceedings of the IEEE Congress on Evolutionary Computation, 2012

Productivity and Performance of Global-View Programming with XcalableMP PGAS Language.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Preface.
Proceedings of the International Conference on Computational Science, 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

A distributed architecture of Sensing Web for sharing open sensor nodes.
Future Gener. Comput. Syst., 2011

Advanced Institute for Computational Science (AICS): Japanese National High-Performance Computing Research Institute and its 10-petaflops supercomputer "K".
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Experience Using Lazy Task Creation in OpenMP Task for the UTS Benchmark.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

An 80Gb/s dependable communication SoC with PCI express I/F and 8 CPUs.
Proceedings of the IEEE International Solid-State Circuits Conference, 2011

Audit: New Synchronization for the GET/PUT Protocol.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

PEARL and PEACH: A Novel PCI Express Direct Link and Its Implementation.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Efficient Work-Stealing Strategies for Fine-Grain Task Parallelism.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

XMCAPI: Inter-core Communication Interface on Multi-chip Embedded Systems.
Proceedings of the IEEE/IFIP 9th International Conference on Embedded and Ubiquitous Computing, 2011

An 80 Gbps dependable multicore communication SoC with PCI express I/F and intelligent interrupt controller.
Proceedings of the 2011 IEEE Symposium on Low-Power and High-Speed Chips, 2011

2010
Customizing Virtual Machine with Fault Injector by Integrating with SpecC Device Model for a Software Testing Environment D-Cloud.
Proceedings of the 16th IEEE Pacific Rim International Symposium on Dependable Computing, 2010

XcalableMP implementation and performance of NAS Parallel Benchmarks.
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

Large-Scale Software Testing Environment Using Cloud Computing Technology for Dependable Parallel and Distributed Systems.
Proceedings of the Third International Conference on Software Testing, 2010

Implementation and Performance Evaluation of XcalableMP: A Parallel Programming Language for Distributed Memory Systems.
Proceedings of the 39th International Conference on Parallel Processing, 2010

PEARL: Power-Aware, Dependable, and High-Performance Communication Link Using PCI Express.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

Keynote.
Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, 2010

Runtime Energy Adaptation with Low-Impact Instrumented Code in a Power-Scalable Cluster System.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

D-Cloud: Design of a Software Testing Environment for Reliable Distributed Systems Using Cloud Computing Technology.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

A Faceted-Navigation System for QCDml Ensemble XML Data.
Proceedings of the 3PGCIC 2010, 2010

2009
Large Scale Distributed and Parallel Computing for Linear Algebra Problems: Practice and Experience.
Proceedings of the Parallel Programming, Models and Applications in Grid and P2P Systems., 2009

Programmability Issues.
Int. J. High Perform. Comput. Appl., 2009

Resolution of large symmetric eigenproblems on a world-wide grid.
Int. J. Grid Util. Comput., 2009

Evaluation of Multicore Processors for Embedded Systems by Parallel Benchmark Program Using OpenMP.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

Towards an Open Dependable Operating System.
Proceedings of the 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, 2009

RI2N/DRV: Multi-link ethernet for high-bandwidth and fault-tolerant network on PC clusters.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Performance Evaluation of OpenMP and MPI Hybrid Programs on a Large Scale Multi-core Multi-socket Cluster, T2K Open Supercomputer.
Proceedings of the ICPPW 2009, 2009

Flexible Multi-link Ethernet Binding System for PC Clusters with Asymmetric Topology.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Reliable Software Distributed Shared Memory Using Page Migration.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Power and QoS performance characteristics of virtualized servers.
Proceedings of the 2009 10th IEEE/ACM International Conference on Grid Computing, 2009

Using a cluster as a memory resource: A fast and large virtual memory on MPI.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Guest Editors Introduction: Special Issue on OpenMP.
Int. J. Parallel Program., 2008

Integrating Computing Resources on Multiple Grid-Enabled Job Scheduling Systems Through a Grid RPC System.
J. Grid Comput., 2008

A parallel method for large sparse generalized eigenvalue problems using a GridRPC system.
Future Gener. Comput. Syst., 2008

Power management of distributed web savers by controlling server power state and traffic prediction for QoS.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

OpenMPD: A Directive-Based Data Parallel Language Extension for Distributed Memory Systems.
Proceedings of the 37th International Conference on Parallel Processing, 2008

Performance Evaluation of Data Management Layer by Data Sharing Patterns for Grid RPC Applications.
Proceedings of the Euro-Par 2008, 2008

RI2N: High-bandwidth and fault-tolerant network with multi-link Ethernet for PC clusters.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

DLM: A distributed Large Memory System using remote memory swapping over cluster nodes.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Runtime DVFS control with instrumented Code in power-scalable cluster system.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Design and Implementation of OpenMPD: An OpenMP-Like Programming Language for Distributed Memory Systems.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Direct Execution of Linux Binary on Windows for Grid RPC Workers.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Survey of Six Myths and Oversights about Distributed Hash Tables' Security.
Proceedings of the 27th International Conference on Distributed Computing Systems Workshops (ICDCS 2007 Workshops), 2007

Bandwidth-Aware Design of Large-Scale Clusters for Scientific Computations.
Proceedings of the High Performance Computing and Communications, 2007

Toward power-aware computing with dynamic voltage scaling for heterogeneous platforms.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Development and Implementation of an Interactive Parallelization Assistance Tool for OpenMP: iPat/OMP.
IEICE Trans. Inf. Syst., 2006

Editorial: Special Issue on Global and Peer-to-Peer Computing.
J. Grid Comput., 2006

Storage challenge - High performance data analysis for particle physics using the Gfarm file system.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MegaProto/E: power-aware high-performance cluster with commodity technology.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A scalable communication layer for multi-dimensional hyper crossbar network using multiple gigabit ethernet.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Performance Improvement by Data Management Layer in a Grid RPC System.
Proceedings of the Advances in Grid and Pervasive Computing, 2006

Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

PACS-CS: A Large-Scale Bandwidth-Aware PC Cluster for Scientific Computations.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Editorial.
Parallel Comput., 2005

OpenGR: A directive-based grid programming environment.
Parallel Comput., 2005

MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Computation of High-Precision Mathematical Constants in a Combined Cluster and Grid Environment.
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005

Design of a Software Distributed Shared Memory System using an MPI communication layer.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Empirical Study for Optimization of Power-Performance with On-Chip Memory.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

MegaProto: A Low-Power and Compact Cluster for High-Performance Computing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Grid Environment for Computational Astrophysics Driven by GRAPE-6 with HMCS-G and OmniRPC.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Grid and Cluster Matrix Computation with Persistent Storage and Out-of-core Programming.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

2004
SCIMA-SMP: on-chip memory processor architecture for SMP.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

The Second Trans-Pacific Grid Datafarm Testbed and Experiments for SC2003.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Performance Evaluation of OmniRPC in a Grid Environment.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs.
Proceedings of the Applied Parallel Computing, 2004

A Parallel Method for Large Sparse Generalized Eigenvalue Problems by OmniRPC in a Grid Environment.
Proceedings of the Applied Parallel Computing, 2004

Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Implementation and performance evaluation of CONFLEX-G: grid-enabled molecular conformational space search program with OmniRPC.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multi-computer System.
Proceedings of the Computational Science, 2004

2003
Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks.
Int. J. Parallel Program., 2003

An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

RI2N - Interconnection Network System for Clusters with Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

OmniRPC: a Grid RPC ystem for Parallel Programming in Cluster and Grid Environment.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

Preliminary Evaluation of Dynamic Load Balancing Using Loop Re-partitioning on Omni/SCASH.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

Performance of Cluster-enabled OpenMP for the SCASH Software Distributed Shared Memory System.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
Exploiting cluster networks for distributed object groups and collective operations.
Future Gener. Comput. Syst., 2002

OpenMP: Parallel Programming API for Shared Memory Multiprocessors and On-Chip Multiprocessors.
Proceedings of the 15th International Symposium on System Synthesis (ISSS 2002), 2002

Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs.
Proceedings of the Euro-Par 2002, 2002

2001
Compiler optimization techniques for OpenMP programs.
Sci. Program., 2001

Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system.
Sci. Program., 2001

TACO - Prototyping High-Level Object-Oriented Programming Constructs by Means of Template Based Programming Techniques.
ACM SIGPLAN Notices, 2001

OmniRPC: A Grid RPC Facility for Cluster and Global Computing in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

The Omni OpenMP Compiler on the Distributed Shared Memory of Cenju-4.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Tolerating Communication Latency through Dynamic Thread Invocation in a Multithreaded Architecture.
Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

TACO-Exploiting Cluster Networks for High-Level Collective Operations.
Proceedings of the First IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2001), 2001

2000
Network interface active messages for low overhead communication on SMP PC clusters.
Future Gener. Comput. Syst., 2000

Performance Evaluation of OpenMP Applications with Nested Parallelism.
Proceedings of the Languages, 2000

Impact of OpenMP Optimizations for the MGCG Method.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Performance Evaluation of the Omni OpenMP Compiler.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Template Based Structured Collections.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Performance Evaluation of a Firewall-Compliant Globus-based Wide-Area Cluster System.
Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, 2000

Design Issues of Network Enabled Server Systems for the Grid.
Proceedings of the Grid Computing, 2000

TACO -- Dynamic Distributed Collections with Templates and Topologies.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
COMPaS: a PC-based SMP cluster.
IEEE Concurr., 1999

Design and implementations of Ninf: towards a global computing infrastructure.
Future Gener. Comput. Syst., 1999

Resource Manager for Globus-Based Wide-Area Cluster Computing.
Proceedings of the International Workshop on Cluster Computing (IWCC '99), 1999

A Comparison of Automatic Parallelizing Compiler and Improvements by Compiler Directives.
Proceedings of the High Performance Computing, Second International Symposium, 1999

Generic Programming for Parallel Mesh Problems.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1999

Parallelization of Saprse Cholesky Factorization on an SMP Cluster.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

1998
Ninf and PM: Communication libraries for global computing and high-performance cluster computing.
Future Gener. Comput. Syst., 1998

Ninflet: a migratable parallel objects framework using Java.
Concurr. Pract. Exp., 1998

Janus: A C++ Template Library for Parallel Dynamic Mesh Applications.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1998

COMPaS: A Pentium Pro PC-based SMP Cluster and Its Experience.
Proceedings of the Parallel and Distributed Processing, 10 IPPS/SPDP'98 Workshops Held in Conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing, Orlando, Florida, USA, March 30, 1998

Utilizing the Metaserver Architecture in the Ninf Global Computing System.
Proceedings of the High-Performance Computing and Networking, 1998

Practical Simulation of Large-Scale Parallel Programs and Its Performance Analysis of the NAS Parallel Benchmarks.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

1997
Performance Evaluation of a Workstation Cluster, TMC CM-5, and Intel Paragon/XP Using a Parallel Homology Analysis Program.
Parallel Comput., 1997

Data and Workload Distribution in a Multithreaded Architecture.
J. Parallel Distributed Comput., 1997

Fine-Grain Multithreading with the EM-X Multiprocessor.
Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997

Multi-client LAN/WAN Performance Analysis of Ninf: a High-Performance Global Computing System.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

Communication Performance of Gigabit LAN Workstation Cluster RWC/WSC.
Proceedings of the Parallel Computing: Fundamentals, 1997

A Compile-Time Meta-Level Architecture Supporting Class Specific Optimization.
Proceedings of the Scientific Computing in Object-Oriented Parallel Environments, 1997

Parallel Array Class Implementation Using C++ STL Adaptors.
Proceedings of the Scientific Computing in Object-Oriented Parallel Environments, 1997

A Framework for Parallel Adaptive Finite Element Methods and Its Template Based Implementation in CC++.
Proceedings of the Scientific Computing in Object-Oriented Parallel Environments, 1997

Experience with Fine-Grain Communication in EM-X Multiprocessor for Parallel Sparse Matrix Computation.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

PM: An Operating System Coordinated High Performance Communication Library.
Proceedings of the High-Performance Computing and Networking, 1997

Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure.
Proceedings of the High-Performance Computing and Networking, 1997

Experiences with the C++ Standard Template Library and MPI for a Parallel Particle Simulation Method.
Proceedings of the High-Performance Computing and Networking, 1997

Efficient Implementation of Portable C*-like Data-Parallel Library in C++.
Proceedings of the 1997 Advances in Parallel and Distributed Computing Conference (APDC '97), 1997

Parallel Execution of Radix Sort Program Using Fine-Grain Communication.
Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors.
Proceedings of IPPS '96, 1996

Identifying the capability of overlapping computation with communication.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Reduced Interprocessor-Communication Architecture and its Implementation on EM-4.
Parallel Comput., 1995

An Experience with Super-Linear Speedup Achieved by Parallel Computing on a Workstation Cluster: Parallel Calculation of Density of States of Large Scale Cyclic Polyacenes.
Parallel Comput., 1995

An Overview of MPC++ - Extended Abstract.
Proceedings of the Parallel Symbolic Languages and Systems, 1995

The EM-X Parallel Computer: Architecture and Basic Performance.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

A Macrotask-level Unlimited Speculative Execution on Multiprocessors.
Proceedings of the 9th international conference on Supercomputing, 1995

Multithreading with the EM-4 distributed-memory multiprocessor.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
Programming with Distributed Data Structure for EM-X Multiprocessor.
Proceedings of the Theory and Practice of Parallel Programming, 1994

Parallel bidirectional heuristic search on the EM-4 multiprocessor.
Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994

Nonnumeric search results on the EM-4 distributed-memory multiprocessor.
Proceedings of the Proceedings Supercomputing '94, 1994

Message-based efficient remote memory access on a highly parallel computer EM-X.
Proceedings of the International Symposium on Parallel Architectures, 1994

Experience with Executing Shared Memory Programs using Fine-Grain Communication and Multithreading in EM-4.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

Progress Report on Porting Sisal to the EM-4 Multiprocessor.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

EM-C: Programming with Explicit Parallelism and Locality for EM-4 Multiprocessor.
Proceedings of the Parallel Architectures and Compilation Techniques, 1994

1993
RICA: Reduced Interprocessor-Communication Architecture - Concept and Mechanisms.
Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

Super-Threading: Architectural and Software Mechanisms for Optimizing Parallel Computation.
Proceedings of the 7th international conference on Supercomputing, 1993

Data Stream Control Optimization in Dataflow Architectures.
Proceedings of the 7th international conference on Supercomputing, 1993

EMC-Y: Parallel Processing Element Optimizing Communication and Computation.
Proceedings of the 7th international conference on Supercomputing, 1993

1992
Evaluation of range-checking addressing modes and the architecture of FLATS2.
Syst. Comput. Jpn., 1992

Thread-based Programming for the EM-4 Hybrid Dataflow Machine.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1990
Multiple instruction streams in a highly pipelined processor.
Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing, 1990

1989
Run-Time Checking in Lisp by Integrating Memory Addressing and Range Checking.
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

1987
A Hybrid algebraic-numeric system ANS and its preliminary implementation.
Proceedings of the EUROCAL '87, 1987


  Loading...