Toshio Endo
Orcid: 0000-0001-7297-6211
According to our database1,
Toshio Endo
authored at least 82 papers
between 1997 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers.
CoRR, 2024
Challenges in Computing Resource Sharing Towards Next-Gen Interactive Accelerated HPC.
Proceedings of the High Performance Computing. ISC High Performance 2024 International Workshops, 2024
Proceedings of the Advancing OpenMP for Future Accelerators, 2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024
Accelerating Stencil Computations on a GPU by Combining Using Tensor Cores and Temporal Blocking.
Proceedings of the 16th Workshop on General Purpose Processing Using GPU, 2024
Proceedings of the IEEE International Conference on Cluster Computing, 2024
Proceedings of the IEEE International Conference on Cluster Computing, 2024
Proceedings of the IEEE International Conference on Cluster Computing, 2024
Proceedings of the IEEE International Conference on Cluster Computing, 2024
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024
2023
Pyramid Swin Transformer: Different-Size Windows Swin Transformer for Image Classification and Object Detection.
Proceedings of the 18th International Joint Conference on Computer Vision, 2023
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications.
Proceedings of the 37th International Conference on Supercomputing, 2023
The Aggressive Oversubscribing Scheduling for Interactive Jobs on a Supercomputing System.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023
Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt).
Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 2023
Proceedings of the Advanced Concepts for Intelligent Vision Systems, 2023
2022
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.
CoRR, 2022
Proceedings of the 23rd ACIS International Summer Virtual Conference on Software Engineering, 2022
Proceedings of the IEEE Intl Conf on Parallel & Distributed Processing with Applications, 2022
mdx: A Cloud Platform for Supporting Data Science and Cross-Disciplinary Research Collaborations.
Proceedings of the IEEE Intl. Conf. on Dependable, 2022
2021
Measurement and Modeling of Performance of HPC Applications Towards Overcommitting Scheduling Systems.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2021
Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021
2020
Integrating Cache Oblivious Approach with Modern Processor Architecture: The Case of Floyd-Warshall Algorithm.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020
2019
An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation.
ACM Trans. Archit. Code Optim., 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
2018
Proceedings of the 26th Euromicro International Conference on Parallel, 2018
Applying Recursive Temporal Blocking for Stencil Computations to Deeper Memory Hierarchy.
Proceedings of the IEEE 7th Non-Volatile Memory Systems and Applications Symposium, 2018
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Exhaustive evaluation of memory-latency sensitivity on manycore processors with large cache.
Proceedings of the 2nd International Conference on High Performance Compilation, 2018
2017
Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017
An Accurate Simulator of Cache-Line Conflicts to Exploit the Underlying Cache Performance.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
A Stencil Framework to Realize Large-Scale Computations Beyond Device Memory Capacity on GPU Supercomputers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
ExanaDBT: A Dynamic Compilation System for Transparent Polyhedral Optimizations at Runtime.
Proceedings of the Computing Frontiers Conference, 2017
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
2016
Proceedings of the Second International Workshop on Extreme Scale Programming Models and Middleware, 2016
Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers.
Proceedings of the Mathematical Software - ICMS 2016, 2016
Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016
2015
Proceedings of the SMARTGREENS 2015, 2015
The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs.
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015
Investigating potential performance benefits of memory layout optimization based on roofline model.
Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015
Exana: an execution-driven application analysis tool for assisting productive performance tuning.
Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015
2014
Int. J. High Perform. Comput. Appl., 2014
Petascale General Solver for Semidefinite Programming Problems with Over Two Million Constraints.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
An evaluation of the potential of flash SSD as large and slow memory for stencil computations.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014
Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
2013
A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
High-performance general solver for extremely large-scale semidefinite programming problems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
2011
Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer.
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
2010
An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code.
Proceedings of the Conference on High Performance Computing Networking, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the International Green Computing Conference 2010, 2010
2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009
2008
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007
2005
Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005
2004
Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004
2003
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003
2002
Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002
2001
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
1998
Proceedings of IAPR Workshop on Machine Vision Applications, 1998
1997
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997