Reiji Suda

Proceedings of the Fifth International Symposium on Computing and Networking, 2017

Fast maximal Poisson-disk sampling by randomized tiling.

[BibT_eX]

[DOI]

Tong Wang

Proceedings of High Performance Graphics, 2017

2016

Xevtgen: Fortran code transformer generator for high performance scientific codes.

[BibT_eX]

[DOI]

Hiroyuki Takizawa

Shoichi Hirasawa

Int. J. Netw. Comput., 2016

Efficient Parallel Algorithm for Optimal DAG Structure Search on Parallel Computer with Torus Network.

[BibT_eX]

[DOI]

Hirokazu Honda

Yoshinori Tamada

Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

Xevdriver: A Software System Supporting XML-based Source-to-Source Code Transformations on Fortran Programs.

[BibT_eX]

[DOI]

Hiroyuki Takizawa

Proceedings of the Fourth International Symposium on Computing and Networking, 2016

2015

Performance Analysis of the Chebyshev Basis Conjugate Gradient Method on the K Computer.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

2014

The future of accelerator programming: abstraction, performance or can we have both?

[BibT_eX]

[DOI]

Martin Burtscher

Proceedings of the Symposium on Applied Computing, 2014

2013

Analysis Of The Girth For Regular Bi-partite Graphs With Degree 3

[BibT_eX]

[DOI]

CoRR, 2013

Enumeration Based Search Algorithm For Finding A Regular Bi-partite Graph Of Maximum Attainable Girth For Specified Degree And Number Of Vertices

[BibT_eX]

[DOI]

CoRR, 2013

An Efficient Task Partitioning and Scheduling Method for Symmetric Multiple GPU Architecture.

[BibT_eX]

[DOI]

Cheng Luo

Proceedings of the 12th IEEE International Conference on Trust, 2013

[BibT_eX]

[DOI]

Tian Xiaochen

Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

High Performance GPU Accelerated Local Optimization in TSP.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A Mathematical Method for Online Autotuning of Power and Energy Consumption with Corrected Temperature Effects.

[BibT_eX]

[DOI]

Luo Cheng

Takahiro Katagiri

Proceedings of the International Conference on Computational Science, 2013

2012

Energy-Aware SIMD Algorithm Design on GPU and Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

Global optimization model on power efficiency of GPU and multicore processing element for SIMD computing with CUDA.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2012

Partition Parameters for Girth Maximum (m, r) BTUs

[BibT_eX]

[DOI]

CoRR, 2012

Balanced Tanner Units And Their Properties

[BibT_eX]

[DOI]

CoRR, 2012

Automatic Parameter Optimization for Edit Distance Algorithm on GPU.

[BibT_eX]

[DOI]

Ayumu Tomiyama

Proceedings of the High Performance Computing for Computational Science, 2012

Brief announcement: a GPU accelerated iterated local search TSP solver.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Poster: High Performance GPU Accelerated TSP Solver.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: High Performance GPU Accelerated TSP Solver.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

MSSM: An Efficient Scheduling Mechanism for CUDA Basing on Task Partition.

[BibT_eX]

[DOI]

Cheng Luo

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

An efficient GPU implementation of a multi-start TSP solver for large problem instances.

[BibT_eX]

[DOI]

Proceedings of the Genetic and Evolutionary Computation Conference, 2012

Accelerating 2-opt and 3-opt Local Search Using GPU in the Travelling Salesman Problem.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011

APTCC: Auto Parallelizing Translator From C To CUDA.

[BibT_eX]

[DOI]

Takehiko Nawata

Proceedings of the International Conference on Computational Science, 2011

Parallel Monte Carlo Tree Search on GPU.

[BibT_eX]

[DOI]

Proceedings of the Eleventh Scandinavian Conference on Artificial Intelligence, 2011

Parallelizing a Coarse Grain Graph Search Problem Based upon LDPC Codes on a Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on Parallel Computing in Electrical Engineering (PARELEC 2011), 2011

Large-Scale Parallel Monte Carlo Tree Search on GPU.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

A Performance and Energy Consumption Analytical Model for GPU.

[BibT_eX]

[DOI]

Cheng Luo

Proceedings of the IEEE Ninth International Conference on Dependable, 2011

Parallel Monte Carlo Tree Search Scalability Discussion.

[BibT_eX]

[DOI]

Proceedings of the AI 2011: Advances in Artificial Intelligence, 2011

Experimental Estimation and Analysis of the Power Efficiency of CUDA Processing Element on SIMD Computing.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE/ACIS International Conference on Computer and Information Science, 2011

2010

Investigation on the power efficiency of multi-core and GPU Processing Element in large scale SIMD computation with CUDA.

[BibT_eX]

[DOI]

Proceedings of the International Green Computing Conference 2010, 2010

Software Automatic Tuning: Concepts and State-of-the-Art Results.

[BibT_eX]

[DOI]

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

A Bayesian Method of Online Automatic Tuning.

[BibT_eX]

[DOI]

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

Autotuning Method for Deciding Block Size Parameters in Dynamically Load-Balanced BLAS.

[BibT_eX]

[DOI]

Yuta Sawa

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

Toward Automatic Performance Tuning for Numerical Simulations in the SILC Matrix Computation Framework.

[BibT_eX]

[DOI]

Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009

Parallel Minimax Tree Searching on GPU.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Modeling and Optimizing the Power Performance of Large Matrices Multiplication on Multi-core and GPU Platform with CUDA.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Accurate Measurements and Precise Modeling of Power Dissipation of CUDA Kernels toward Power Optimized High Performance CPU-GPU Computing.

[BibT_eX]

[DOI]

Proceedings of the 2009 International Conference on Parallel and Distributed Computing, 2009

Modeling and Estimation for the Power Consumption of Matrix Computation on Multi-core Platform.

[BibT_eX]

[DOI]

Proceedings of the Second International Joint Conference on Computational Sciences and Optimization, 2009

Power Efficient Large Matrices Multiplication by Load Scheduling on Multi-core and GPU Platform with CUDA.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE International Conference on Computational Science and Engineering, 2009

Aspects of GPU for general purpose high performance computing.

[BibT_eX]

[DOI]

Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008

Divisible load scheduling with improved asymptotic optimality.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

An optimized Dynamic Load Balancing method for parallel 3-D mesh refinement for finite element electromagnetics with Tetrahedra.

[BibT_eX]

[DOI]

Dennis Giannacopoulos

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007

Cloth Simulation in the SILC Matrix Computation Framework: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2007

High Performance FFT on SGI Altix 3700.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing and Communications, 2007

2006

Distributed SILC: An Easy-to-Use Interface for MPI-Based Parallel Matrix Computation Libraries.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005

SILC: A Flexible and Environment-Independent Interface for Matrix Computation Libraries.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2005

Performance Evaluation of Parallel Sparse Matrix-Vector Products on SGI Altix3700.

[BibT_eX]

[DOI]

Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

2002

A fast spherical harmonics transform algorithm.

[BibT_eX]

[DOI]

Masayasu Takami

Math. Comput., 2002

1999

A high performance parallelization scheme for the Hessenberg double shift QR algorithm.

[BibT_eX]

[DOI]

Akira Nishida

Yoshio Oyanagi

Parallel Comput., 1999

1998

The Ensparsed LU Decomposition Method for Large Scale Circuit Transient Analysis.

[BibT_eX]

[DOI]

Yoshio Oyanagi

Proceedings of the ASP-DAC '98, 1998

1995

Implementation of Sparta, a Highly Parallel Circuit Simulator by the Preconditioned Jacobi Method, on a Distributed Memory Machine.

[BibT_eX]

[DOI]