Toshio Endo

Proceedings of the Computing Frontiers Conference, 2017

ooc_cuDNN: Accommodating convolutional neural networks over GPU memory capacity.

[BibT_eX]

[DOI]

Yuki Ito

,

Ryo Matsumiya

,

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

PGAS Communication Runtime for Extreme Large Data Computation.

[BibT_eX]

[DOI]

Ryo Matsumiya

,

Proceedings of the Second International Workshop on Extreme Scale Programming Models and Middleware, 2016

Advanced Computing and Optimization Infrastructure for Extremely Large-Scale Graphs on Post Peta-Scale Supercomputers.

[BibT_eX]

[DOI]

Katsuki Fujisawa

,

,

Yuichiro Yasui

Proceedings of the Mathematical Software - ICMS 2016, 2016

Realizing Out-of-Core Stencil Computations Using Multi-tier Memory Hierarchy on GPGPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

From FLOPS to BYTES: disruptive change in high-performance computing towards the post-moore era.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Evaluating the impacts of code-level performance tunings on power efficiency.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models.

[BibT_eX]

[DOI]

Kazuki Tsuzuku

,

Proceedings of the SMARTGREENS 2015, 2015

The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs.

[BibT_eX]

[DOI]

Yuki Tsujita

,

,

Katsuki Fujisawa

Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015

Investigating potential performance benefits of memory layout optimization based on roofline model.

[BibT_eX]

[DOI]

Shimpei Sato

,

Yukinori Sato

,

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015

Exana: an execution-driven application analysis tool for assisting productive performance tuning.

[BibT_eX]

[DOI]

Yukinori Sato

,

Shimpei Sato

,

Proceedings of the 2nd International Workshop on Software Engineering for Parallel Systems, 2015

Data Driven Scheduling Approach for the Multi-node Multi-GPU Cholesky Decomposition.

[BibT_eX]

[DOI]

Yuki Tsujita

,

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2015

Exploration of Lossy Compression for Application-Level Checkpoint/Restart.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Realizing Extremely Large-Scale Stencil Applications on GPU Supercomputers.

[BibT_eX]

[DOI]

,

Yuki Takasaki

,

Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Petascale General Solver for Semidefinite Programming Problems with Over Two Million Constraints.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

TSUBAME-KFC: A modern liquid submersion cooling prototype towards exascale becoming the greenest supercomputer in the world.

[BibT_eX]

[DOI]

,

Akira Nukada

,

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

An evaluation of the potential of flash SSD as large and slow memory for stencil computations.

[BibT_eX]

[DOI]

Hiroko Midorikawa

,

Hideyuki Tan

,

Proceedings of the International Conference on High Performance Computing & Simulation, 2014

Software technologies coping with memory hierarchy of GPGPU clusters for stencil computations.

[BibT_eX]

[DOI]

,

Guanghao Jin

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

A Multi-Level Optimization Method for Stencil Computation on the Domain that is Bigger than Memory Capacity of GPU.

[BibT_eX]

[DOI]

Guanghao Jin

,

,

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A parallel optimization method for stencil computation on the domain that is bigger than memory capacity of GPUs.

[BibT_eX]

[DOI]

Guanghao Jin

,

,

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

High-performance general solver for extremely large-scale semidefinite programming problems.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the Conference on High Performance Computing Networking, 2011

Petaflop biofluidics simulations on a two million-core system.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Conference on High Performance Computing Networking, 2011

An 80-Fold Speedup, 15.0 TFlops Full GPU Acceleration of Non-Hydrostatic Weather Model ASUCA Production Code.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Proceedings of the Conference on High Performance Computing Networking, 2010

Linpack evaluation on a supercomputer with heterogeneous accelerators.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Statistical power modeling of GPU kernels using performance counters.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the International Green Computing Conference 2010, 2010

Power-aware dynamic task scheduling for heterogeneous accelerated clusters.

[BibT_eX]

[DOI]

Tomoaki Hamano

,

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

File Clustering Based Replication Algorithm in a Grid Environment.

[BibT_eX]

[DOI]

Hitoshi Sato

,

,

Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Bandwidth intensive 3-D FFT kernel for GPUs using CUDA.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Locality aware MPI communication on a commodity opto-electronic hybrid network.

[BibT_eX]

[DOI]

Shin'ichiro Takizawa

,

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

An efficient, model-based CPU-GPU heterogeneous FFT library.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Performance evaluation of parallel applications on next generation memory architecture with power-aware paging method.

[BibT_eX]

[DOI]

Y. Hosogaya

,

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Massive supercomputing coping with heterogeneity of modern accelerators.

[BibT_eX]

[DOI]

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Access-pattern and bandwidth aware file replication algorithm in a grid environment.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Environmental-aware optimization of MPI checkpointing intervals.

[BibT_eX]

[DOI]

Hideyuki Jitsumoto

,

,

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

ABARIS: An Adaptable Fault Detection/Recovery Component Framework for MPIs.

[BibT_eX]

[DOI]

Hideyuki Jitsumoto

,

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High-Performance MPI Broadcast Algorithm for Grid Environments Utilizing Multi-lane NICs.

[BibT_eX]

[DOI]

Tatsuhiro Chiba

,

,

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Highly latency tolerant Gaussian elimination.

[BibT_eX]

[DOI]

,

Proceedings of the 6th IEEE/ACM International Conference on Grid Computing (GRID 2005), 2005

High performance LU factorization for non-dedicated clusters.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2003

Reducing pause time of conservative collectors.

[BibT_eX]

[DOI]

,

Proceedings of The Workshop on Memory Systems Performance (MSP 2002), 2002

Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

,

,

Akinori Yonezawa

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

On a High-Speed Hough Transform Algorithm MRHT.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of IAPR Workshop on Machine Vision Applications, 1998

A Scalable Mark-Sweep Garbage Collector on Large-Scale Shared-Memory Machines.

[BibT_eX]

[DOI]

,