Hiroyuki Takizawa

Kentaro Sano

Proceedings of the HEART 2022: International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, Tsukuba, Japan, June 9, 2022

mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.

[BibT_eX]

[DOI]

Proceedings of the IEEE Intl. Conf. on Dependable, 2022

2021

OpenCL-like offloading with metaprogramming for SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Parallel Comput., 2021

Preemptive Parallel Job Scheduling for Heterogeneous Systems Supporting Urgent Computing.

[BibT_eX]

[DOI]

IEEE Access, 2021

Towards Conflict-Aware Workload Co-execution on SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2021

Evaluating the Performance and Conformance of a SYCL Implementation for SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Jiahao Li

Proceedings of the Parallel and Distributed Computing, Applications and Technologies, 2021

Spatiotemporal Anomaly Detection for Large-Scale Sensor Data.

[BibT_eX]

[DOI]

Minglu Zhao

Tomoya Soma

Proceedings of the 12th International Symposium on Parallel Architectures, 2021

Portability of Vectorization-aware Performance Tuning Expertise across System Generations.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

A memory bank conflict prevention mechanism for SYCL on SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Symposium on Computing and Networking, 2021

neoSYCL: a SYCL implementation for SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Yinan Ke

Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021

2020

ExaFSA: Parallel Fluid-Structure-Acoustic Simulation.

[BibT_eX]

[DOI]

Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Improving Quantum Annealing Performance on Embedded Problems.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2020

Online MPI Process Mapping for Coordinating Locality and Memory Congestion on NUMA Systems.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2020

Xevolver: A code transformation framework for separation of system-awareness from application codes.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

DeLoc: A Locality and Memory-Congestion-Aware Task Mapping Method for Modern NUMA Systems.

[BibT_eX]

[DOI]

IEEE Access, 2020

Exploiting the Potentials of the Second Generation SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

Task Priority Control for the HPX Runtime System.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Automatically Avoiding Memory Access Conflicts on SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

A Conflict-Aware Capacity Control Mechanism for Last-Level Cache.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Improving the Accuracy in SpMV Implementation Selection with Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on Computing and Networking Workshops, 2020

Polymorphic Data Layout for SX-Aurora TSUBASA Vector Engines.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on Computing and Networking, 2020

Failure Prediction in Datacenters Using Unsupervised Multimodal Anomaly Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

Comparison of Direct and Indirect Networks for High-Performance FPGA Clusters.

[BibT_eX]

[DOI]

Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2020

2019

Performance Evaluation of Different Implementation Schemes of an Iterative Flow Solver on Modern Vector Machines.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2019

Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2019

Peachy Parallel Assignments (EduHPC 2019).

[BibT_eX]

[DOI]

Steven Bogaerts

Arturo González-Escribano

Daniel A. Ellsworth

Jorge Fernández-Fabeiro

Sukhamay Kundu

Alina Lazar

Proceedings of the 2019 IEEE/ACM Workshop on Education for High-Performance Computing, 2019

An OpenCL-Like Offload Programming Framework for SX-Aurora TSUBASA.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019

An Automatic MPI Process Mapping Method Considering Locality and Memory Congestion on NUMA Systems.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2019

Scaling Performance for N-Body Stream Computation with a Ring of FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Symposium in Low-Power and High-Speed Chips, 2019

2018

A Machine Learning-Based Approach for Selecting SpMV Kernels and Matrix Storage Formats.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2018

Use of Code Structural Features for Machine Learning to Predict Effective Optimizations.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Preconditioner Auto-Tuning Using Deep Learning for Sparse Iterative Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Investigating the Effects of Dynamic Thread Team Size Adjustment for Irregular Applications.

[BibT_eX]

[DOI]

Xiong Xiao

Proceedings of the Sixth International Symposium on Computing and Networking, 2018

Enhancing Memory Bandwidth in a Single Stream Computation with Multiple FPGAs.

[BibT_eX]

[DOI]

Antoniette Mondigo

Kentaro Sano

Proceedings of the International Conference on Field-Programmable Technology, 2018

A Failure Prediction-Based Adaptive Checkpointing Method with Less Reliance on Temperature Monitoring for HPC Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Automatic Hyperparameter Tuning of Machine Learning Models under Time Constraints.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Performance Estimation of Deeply Pipelined Fluid Simulation on Multiple FPGAs with High-speed Communication Subsystem.

[BibT_eX]

[DOI]

Antoniette Mondigo

Kentaro Sano

Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

2017

Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE.

[BibT_eX]

[DOI]

J. Supercomput., 2017

Toward Dynamic Load Balancing across OpenMP Thread Teams for Irregular Workloads.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2017

A Directive Generation Approach to High Code-Maintainability for Various HPC Systems.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2017

Energy-Performance Modeling of Speculative Checkpointing for Exascale Systems.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2017

Optimizing Energy Consumption on HPC Systems with a Multi-Level Checkpointing Mechanism.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

A Customizable Auto-Tuning Scenario with User-Defined Code Transformations.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

An Application-Level Incremental Checkpointing Mechanism with Automatic Parameter Tuning.

[BibT_eX]

[DOI]

Kazuhiko Komatsu

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Designing an Open Database of System-Aware Code Optimizations.

[BibT_eX]

[DOI]

Kazuhiko Komatsu

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

A Memory Congestion-Aware MPI Process Placement for Modern NUMA Systems.

[BibT_eX]

[DOI]

Kazuhiko Komatsu

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Vectorization-Aware Loop Optimization with User-Defined Code Transformations.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Performance and Power Analysis of SX-ACE Using HP-X Benchmark Programs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

Xevtgen: Fortran code transformer generator for high performance scientific codes.

[BibT_eX]

[DOI]

Reiji Suda

Shoichi Hirasawa

Int. J. Netw. Comput., 2016

Translation of Large-Scale Simulation Codes for an OpenACC Platform Using the Xevolver Framework.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2016

A Code Selection Mechanism Using Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

Xevdriver: A Software System Supporting XML-based Source-to-Source Code Transformations on Fortran Programs.

[BibT_eX]

[DOI]

Reiji Suda

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A Directive Generation Approach Using User-Defined Rules.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A User-Defined Code Transformation Approach to Overlapping MPI Communication with Computation.

[BibT_eX]

[DOI]

Yasuharu Hayashi

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

A cache partitioning mechanism to protect shared data for CMPs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Symposium in Low-Power and High-Speed Chips, 2016

2015

Optimized Data Transfers Based on the OpenCL Event Management Mechanism.

[BibT_eX]

[DOI]

Sci. Program., 2015

Identification and Elimination of Platform-Specific Code Smells in High Performance Computing Applications.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2015

FLEXII: A Flexible Insertion Policy for Dynamic Cache Resizing Mechanisms.

[BibT_eX]

[DOI]

IEICE Trans. Electron., 2015

A Light-Weight Rollback Mechanism for Testing Kernel Variants in Auto-Tuning.

[BibT_eX]

[DOI]

Shoichi Hirasawa

IEICE Trans. Inf. Syst., 2015

A Case Study of User-Defined Code Transformations for Data Layout Optimizations.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

Migration of an Atmospheric Simulation Code to an OpenACC Platform Using the Xevolver Framework.

[BibT_eX]

[DOI]

Proceedings of the Third International Symposium on Computing and Networking, 2015

A Verification Framework for Streamlining Empirical Auto-Tuning.

[BibT_eX]

[DOI]

Shoichi Hirasawa

Proceedings of the Third International Symposium on Computing and Networking, 2015

An energy-efficient dynamic memory address mapping mechanism.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Symposium in Low-Power and High-Speed Chips, 2015

2014

MVP-Cache: A Multi-Banked Cache Memory for Energy-Efficient Vector Processing of Multimedia Applications.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2014

Automatic Parameter Tuning of Hierarchical Incremental Checkpointing.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

A Compiler-Assisted OpenMP Migration Method Based on Automatic Parallelizing Information.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

An Approach to Customization of Compiler Directives for Application-Specific Code Transformations.

[BibT_eX]

[DOI]

Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs, 2014

A Platform-Specific Code Smell Alert System for High Performance Computing Applications.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Xevolver: An XML-based code translation framework for supporting HPC application migration.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

An energy optimization method for vector processing mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Symposium on Low-Power and High-Speed Chips, 2014

On-chip checkpointing with 3D-stacked memories.

[BibT_eX]

[DOI]

Proceedings of the 2014 International 3D Systems Integration Conference, 2014

2013

A Capacity-Aware Thread Scheduling Method Combined with Cache Partitioning to Reduce Inter-Thread Cache Conflicts.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2013

Balanced Ternary Quantum Voltage Generator Based on Zero Crossing Shapiro Steps in Asymmetric Two-Junction SQUIDs.

[BibT_eX]

[DOI]

Masataka Moriya

Yoshinao Mizugaki

IEICE Trans. Electron., 2013

clMPI: An OpenCL Extension for Interoperation with the Message Passing Interface.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Design and evaluation of a media-oriented vector processor with a multi-banked cache memory.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE Symposium on Embedded Systems for Real-time Multimedia, 2013

A flexible insertion policy for dynamic cache resizing mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Symposium on Low-Power and High-Speed Chips, 2013

2012

Poster: Exploring Design Space of a 3D Stacked Vector Cache - Designing a 3D Stacked Vector Cache using Conventional EDA Tools.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Exploring Design Space of a 3D Stacked Vector Cache.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

GPU implementation of phase-based stereo correspondence and its application.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE International Conference on Image Processing, 2012

A media-oriented vector architectural extension with a high bandwidth cache system.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Symposium on Low-Power and High-Speed Chips, 2012

A capacity-efficient insertion policy for dynamic cache resizing mechanisms.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'12, 2012

An out-of-order vector processing mechanism for multimedia applications.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011

Power-Aware Dynamic Cache Partitioning for CMPs.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2011

A Self-Organized Overlay Network Management Mechanism for Heterogeneous Environments.

[BibT_eX]

[DOI]

Tsutomu Inaba

J. Inf. Process., 2011

A Network Clustering Algorithm for Sybil-Attack Resisting.

[BibT_eX]

[DOI]

IEICE Trans. Inf. Syst., 2011

A History-Based Performance Prediction Model with Profile Data Classification for Automatic Task Allocation in Heterogeneous Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

CheCL: Transparent Checkpointing and Process Migration of OpenCL Applications.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

NVCR: A Transparent Checkpoint-Restart Library for NVIDIA CUDA.

[BibT_eX]

[DOI]

Akira Nukada

Satoshi Matsuoka

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Effects of 3-D stacked vector cache on energy consumption.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International 3D Systems Integration Conference (3DIC), Osaka, Japan, January 31, 2011

2010

Resisting Sybil Attack By Social Network and Network Clustering.

[BibT_eX]

[DOI]

Proceedings of the Tenth Annual International Symposium on Applications and the Internet, 2010

A voting-based working set assessment scheme for dynamic cache resizing mechanisms.

[BibT_eX]

[DOI]

Proceedings of the 28th International Conference on Computer Design, 2010

A Majority-Based Control Scheme for Way-Adaptable Caches.

[BibT_eX]

[DOI]

Proceedings of the Facing the Multicore-Challenge, 2010

A Load-Forwarding Mechanism for the Vector Architecture in Multimedia Applications.

[BibT_eX]

[DOI]

Proceedings of the 13th Euromicro Conference on Digital System Design, 2010

Cache partitioning strategies for 3-D stacked vector processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on 3D System Integration, 2010

Design and early evaluation of a 3-D die stacked chip multi-vector processor.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on 3D System Integration, 2010

Automatic Tuning of CUDA Execution Parameters for Stencil Processing.

[BibT_eX]

[DOI]

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009

A Performance Study of Secure Data Mining on the Cell Processor.

[BibT_eX]

[DOI]

Hong Wang

Int. J. Grid High Perform. Comput., 2009

Performance evaluation of NEC SX-9 using real science and engineering applications.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

CheCUDA: A Checkpoint/Restart Tool for CUDA Applications.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

Performance tuning and analysis of future vector processors based on the roofline model.

[BibT_eX]

[DOI]

Proceedings of the 10th workshop on MEmory performance, 2009

3D on-chip memory for the vector architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on 3D System Integration, 2009

2008

A Reliability Model for Result Checking in Volunteer Computing.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Symposium on Applications and the Internet, 2008

Consideration of Resource Access History for Optimizing Overlay Networks in P2P-Based Resource Discovery.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Symposium on Applications and the Internet, 2008

A shared cache for a chip multi vector processor.

[BibT_eX]

[DOI]

Proceedings of the 9th workshop on MEmory performance, 2008

Modeling of cache access behavior based on Zipf's law.

[BibT_eX]

[DOI]

Proceedings of the 9th workshop on MEmory performance, 2008

A Utility-Based Double Auction Mechanism for Efficient Grid Resource Allocation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2008

Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2008

Implementation and evaluation of a distributed and cooperative load-balancing mechanism for dependable volunteer computing.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2008

SPRAT: Runtime processor selection for energy-aware computing.

[BibT_eX]

[DOI]

Katsuto Sato

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

First Experiences with NEC SX-9.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing on Vector Systems 2008, 2008

2007

Partial distortion entropy maximization for online data clustering.

[BibT_eX]

[DOI]

Neural Networks, 2007

A dependable Peer-to-Peer computing platform.

[BibT_eX]

[DOI]

Hong Wang

Future Gener. Comput. Syst., 2007

An on-chip cache design for vector processors.

[BibT_eX]

[DOI]

Proceedings of the 2007 workshop on MEmory performance, 2007

A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs.

[BibT_eX]

[DOI]

Proceedings of the 2007 workshop on MEmory performance, 2007

2006

Hierarchical parallel processing of large scale data clustering on a PC cluster with GPU co-processing.

[BibT_eX]

[DOI]

J. Supercomput., 2006

Evaluating Computational Performance of Backpropagation Learning on Graphics Hardware.

[BibT_eX]

[DOI]

Tatsuya Chida

Proceedings of the Irish Conference on the Mathematical Foundations of Computer Science and Information Technology, 2006

A distributed and cooperative load balancing mechanism for large-scale P2P systems.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Symposium on Applications and the Internet Workshops (SAINT 2006 Workshops), 2006

Design and Implementation of an Efficient Search Mechanism Based on the Hybrid P2P Model for Ubiquitous Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Symposium on Applications and the Internet (SAINT 2006), 2006

Implications of Memory Performance for Highly Efficient Supercomputing of Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2006

Radiative Heat Transfer Simulation Using Programmable Graphics Hardware.

[BibT_eX]

[DOI]

Proceedings of the 5th Annual IEEE/ACIS International Conference on Computer and Information Science (ICIS 2006) and 1st IEEE/ACIS International Workshop on Component-Based Software Engineering, 2006

2005

Locality analysis to control dynamically way-adaptable caches.

[BibT_eX]

[DOI]

Isao Kotera

SIGARCH Comput. Archit. News, 2005

A Self-Organizing Overlay Network to Exploit the Locality of Interests for Effective Resource Discovery in P2P Systems.

[BibT_eX]

[DOI]