Kentaro Sano

Orcid: 0000-0002-6681-4192

According to our database1, Kentaro Sano authored at least 101 papers between 1997 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing.
Future Gener. Comput. Syst., 2025

2024
Across Time and Space: Senju's Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs.
ACM Trans. Reconfigurable Technol. Syst., June, 2024

Introduction to the Special Issue on FPL 2022.
ACM Trans. Reconfigurable Technol. Syst., June, 2024

Automated parallel execution of distributed task graphs with FPGA clusters.
Future Gener. Comput. Syst., 2024

Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

RAW 2024 Invited Talk-6: Reconfigurable Architectures for High-Performance Computing.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

A Productive HLS Simulation Approach for Multi-FPGA Systems.
Proceedings of the IEEE International Conference on Consumer Electronics, 2024

HLS Implementation of a Building Cube Stencil Computation Framework for an FPGA Accelerator.
Proceedings of the IEEE International Conference on Consumer Electronics, 2024

Flexible Systolic Array Platform on Virtual 2-D Multi-FPGA Plane.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

Scalable Connection of Qubits to Quantum Error Correction Systems Using Ethernet.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Post-Route Power Estimation: A Case Study of RIKEN-CGRA.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
VCSN: Virtual Circuit-Switching Network for Flexible and Simple-to-Operate Communication in HPC FPGA Cluster.
ACM Trans. Reconfigurable Technol. Syst., June, 2023

Experimental Survey of FPGA-Based Monolithic Switches and a Novel Queue Balancer.
IEEE Trans. Parallel Distributed Syst., May, 2023

Streaming Hardware Compressor Generator Framework.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Hardware Specialization: Estimating Monte Carlo Cross-Section Lookup Kernel Performance and Area.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Achieving Scalable Quantum Error Correction with Union-Find on Systolic Arrays by Using Multi-Context Processing Elements.
Proceedings of the IEEE International Conference on Quantum Computing and Engineering, 2023

Novel Union-Find-based Decoders for Scalable Quantum Error Correction on Systolic Arrays.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Less for More: Reducing Intra-CGRA Connectivity for Higher Performance and Efficiency in HPC.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

ESSPER: Elastic and Scalable FPGA-Cluster System for High-Performance Reconfigurable Computing with Supercomputer Fugaku.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023

Exploration of Compute vs. Interconnect Tradeoffs in CGRAs for HPC.
Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2023

Journal Track Paper ICFPT 2023 : Across Time and Space: Senju's Approach for Scaling Iterative Stencil Loop Accelerators on Single and Multiple FPGAs.
Proceedings of the International Conference on Field Programmable Technology, 2023

Performance Modeling and Scalability Analysis of Stream Computing in ESSPER FPGA Clusters.
Proceedings of the International Conference on Field Programmable Technology, 2023

Senju: A Framework for the Design of Highly Parallel FPGA-based Iterative Stencil Loop Accelerators.
Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2023

2022
The First International Workshop on Coarse-Grained Reconfigurable Architectures for High-Performance Computing (CGRA4HPC).
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

An Architecture- Independent CGRA Compiler enabling OpenMP Applications.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Exploration Framework for Synthesizable CGRAs Targeting HPC: Initial Design and Evaluation.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

FPGA-Dedicated Network vs. Server Network for Pipelined Computing with Multiple FPGAs.
Proceedings of the HEART 2022: International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, Tsukuba, Japan, June 9, 2022

Stream Computation of 3D Approximate Convex Hulls with an FPGA.
Proceedings of the HEART 2022: International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, Tsukuba, Japan, June 9, 2022

A SYCL-based high-level programming framework for HPC programmers to use remote FPGA clusters.
Proceedings of the HEART 2022: International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, Tsukuba, Japan, June 9, 2022

ESSPER: Elastic and Scalable System for High-Performance Reconfigurable Computing with Software-bridged APIs.
Proceedings of the International Conference on Field-Programmable Technology, 2022

Elastic Sample Filter: An FPGA-based Accelerator for Bayesian Network Structure Learning.
Proceedings of the International Conference on Field-Programmable Technology, 2022

Exploring Inter-tile Connectivity for HPC-oriented CGRA with Lower Resource Usage.
Proceedings of the International Conference on Field-Programmable Technology, 2022

The Cost of Flexibility: Embedded versus Discrete Routers in CGRAs for HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Efficient Queue-Balancing Switch for FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2021

A memory bandwidth improvement with memory space partitioning for single-precision floating-point FFT on Stratix 10 FPGA.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Virtual Circuit-Switching Network with Flexible Topology for High-Performance FPGA Cluster.
Proceedings of the 32nd IEEE International Conference on Application-specific Systems, 2021

2020
White Paper from Workshop on Large-scale Parallel Numerical Computing Technology (LSPANC 2020): HPC and Computer Arithmetic toward Minimal-Precision Computing.
CoRR, 2020

A Survey on Coarse-Grained Reconfigurable Architectures From a Performance Perspective.
IEEE Access, 2020

OpenMP Device Offloading to FPGAs Using the Nymble Infrastructure.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Performance Evaluation of Pipelined Communication Combined with Computation in OpenCL Programming on FPGA.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Performance Evaluation and Power Analysis of Teraflop-scale Fluid Simulation with Stratix 10 FPGA.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Extending High-Level Synthesis with High-Performance Computing Performance Visualization.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

A Template-based Framework for Exploring Coarse-Grained Reconfigurable Architectures.
Proceedings of the 31st IEEE International Conference on Application-specific Systems, 2020

Comparison of Direct and Indirect Networks for High-Performance FPGA Clusters.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2020

2019
Scalability Analysis of Deeply Pipelined Tsunami Simulation with Multiple FPGAs.
IEICE Trans. Inf. Syst., 2019

A High Level Synthesis Approach for Application Specific DMA Controllers.
Proceedings of the 2019 International Conference on ReConFigurable Computing and FPGAs, 2019

Crossbar Implementation with Partial Reconfiguration for Stream Switching Applications on an FPGA.
Proceedings of the Parallel Computing: Technology Trends, 2019

Hybrid Network Utilization for Efficient Communication in a Tightly Coupled FPGA Cluster.
Proceedings of the International Conference on Field-Programmable Technology, 2019

A software bridged data transfer on a FPGA cluster by using pipelining and InfiniBand verbs.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

Scaling Performance for N-Body Stream Computation with a Ring of FPGAs.
Proceedings of the 10th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2019

FPGA implementation of a robot control algorithm.
Proceedings of the 24th IEEE International Conference on Emerging Technologies and Factory Automation, 2019

2018
A Guide of Fingerprint Based Radio Emitter Localization Using Multiple Sensors.
IEICE Trans. Commun., 2018

High-productivity Programming and Optimization Framework for Stream Processing on FPGA.
Proceedings of the 9th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies, 2018

Enhancing Memory Bandwidth in a Single Stream Computation with Multiple FPGAs.
Proceedings of the International Conference on Field-Programmable Technology, 2018

Performance Analysis of Hardware-Based Numerical Data Compression on Various Data Formats.
Proceedings of the 2018 Data Compression Conference, 2018

Performance Estimation of Deeply Pipelined Fluid Simulation on Multiple FPGAs with High-speed Communication Subsystem.
Proceedings of the 29th IEEE International Conference on Application-specific Systems, 2018

Hardware Algorithms.
Proceedings of the Principles and Structures of FPGAs., 2018

2017
Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing.
ACM Trans. Reconfigurable Technol. Syst., 2017

FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks.
IEEE Trans. Parallel Distributed Syst., 2017

FPGA-based tsunami simulation: Performance comparison with GPUs, and roofline model for scalability analysis.
J. Parallel Distributed Comput., 2017

Design and scalability analysis of bandwidth-compressed stream computing with multiple FPGAs.
Proceedings of the 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip, 2017

FPGA-based Stream Computing for High-Performance N-Body Simulation using Floating-Point DSP Blocks.
Proceedings of the 8th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2017

2016
Parallelism for High-Performance Tsunami Simulation with FPGA: Spatial or Temporal?
Proceedings of the 24th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2016

2015
Stream Computation of Shallow Water Equation Solver for FPGA-based 1D Tsunami Simulation.
SIGARCH Comput. Archit. News, 2015

DSL-based Design Space Exploration for Temporal and Spatial Parallelism of Custom Stream Computing.
CoRR, 2015

2014
Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth.
IEEE Trans. Parallel Distributed Syst., 2014

FPGA-based Custom Computing Architecture for Large-Scale Fluid Simulation with Building Cube Method.
SIGARCH Comput. Archit. News, 2014

Stream Processor Generator for HPC to Embedded Applications on FPGA-based System Platform.
CoRR, 2014

Bandwidth compression of multiple numerical data streams for high performance custom computing.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

2013
Efficient custom computing of fully-streamed lattice boltzmann method on tightly-coupled FPGA cluster.
SIGARCH Comput. Archit. News, 2013

Parallel and scalable custom computing for real-time fluid simulation on a cluster node with four tightly-coupled FPGAs.
Proceedings of the 23rd International Conference on Field programmable Logic and Applications, 2013

Parameterized Design and Evaluation of Bandwidth Compressor for Floating-Point Data Streams in FPGA-Based Custom Computing.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2013

2012
FPGA-based Connect6 solver with hardware-accelerated move refinement.
SIGARCH Comput. Archit. News, 2012

The NII Shonan Configurable Computing Workshop (NII Shonan Meeting 2012-11).
NII Shonan Meet. Rep., 2012

High-Performance Reconfigurable Computing.
Int. J. Reconfigurable Comput., 2012

Multi-sensor location estimation for illegal cell-phone use in real-life indoor environment.
Proceedings of the IEEE International Conference on Communication Systems, 2012

Cooling efficiency aware workload placement using historical sensor data on IT-facility collaborative control.
Proceedings of the 2012 International Green Computing Conference, 2012

Scalability analysis of tightly-coupled FPGA-cluster for lattice Boltzmann computation.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

Domain-Specific Language and Compiler for Stencil Computation on FPGA-Based Systolic Computational-Memory Array.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

2011
Domain-specific programmable design of scalable streaming-array for power-efficient stencil computation.
SIGARCH Comput. Archit. News, 2011

SW and HW co-design of Connect6 accelerator with scalable streaming cores.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Scalable Streaming-Array of Simple Soft-Processors for Stencil Computations with Constant Memory-Bandwidth.
Proceedings of the IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, 2011

2010
FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods.
ACM Trans. Reconfigurable Technol. Syst., 2010

Prototype implementation of array-processor extensible over multiple FPGAs for scalable stencil computation.
SIGARCH Comput. Archit. News, 2010

Local-and-global stall mechanism for systolic computational-memory array on extensible multi-FPGA system.
Proceedings of the International Conference on Field-Programmable Technology, 2010

Segment-Parallel Predictor for FPGA-Based Hardware Compressor and Decompressor of Floating-Point Data Streams to Enhance Memory I/O Bandwidth.
Proceedings of the 2010 Data Compression Conference (DCC 2010), 2010

FPGA-based lossless compressors of floating-point data streams to enhance memory bandwidth.
Proceedings of the 21st IEEE International Conference on Application-specific Systems Architectures and Processors, 2010

2008
Scalable FPGA-array for high-performance and power-efficient computation based on difference schemes.
Proceedings of the 2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications, 2008

Evaluating power and energy consumption of FPGA-based custom computing machines for scientific floating-point computation.
Proceedings of the 2008 International Conference on Field-Programmable Technology, 2008

2007
FPGA-based Streaming Computation for Lattice Boltzmann Method.
Proceedings of the 2007 International Conference on Field-Programmable Technology, 2007

Systolic Architecture for Computational Fluid Dynamics on FPGAs.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

2005
A Competitive Learning Algorithm with Controlling Maximum Distortion.
J. Adv. Comput. Intell. Intell. Informatics, 2005

2004
Efficient parallel processing of competitive learning algorithms.
Parallel Comput., 2004

Differential coding scheme for efficient parallel image composition on a PC cluster system.
Parallel Comput., 2004

A Systolic Memory Architecture for Fast Codebook Design based on MMPDCL Algorithm.
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04), 2004

Parallel competitive learning algorithm for fast codebook design on partitioned space.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
A Comparison Study of Vector Quantization Codebook Design Algorithms based on the Equidistortion Principle.
Proceedings of the 21st IASTED International Multi-Conference on Applied Informatics (AI 2003), 2003

2002
Parallel Algorithm for the Law-of-the-Jungle Learning to the Fast Design of Optimal Codebooks.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002

Hardware Support for Concurrent Execution of Loops Containing Loop-carried Data Dependences.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002

2001
3DCGiRAM: An Intelligent Memory Architecture for Photo-Realistic Image Synthesis.
Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

1997
Parallel processing of the shear-warp factorization with the binary-swap method on a distributed-memory multiprocessor system.
Proceedings of the IEEE Symposium on Parallel Rendering, 1997


  Loading...