Daisuke Takahashi

Orcid: 0000-0003-1357-5770

According to our database1, Daisuke Takahashi authored at least 121 papers between 1999 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Counterfactual Explanations of Black-box Machine Learning Models using Causal Discovery with Applications to Credit Rating.
Proceedings of the International Joint Conference on Neural Networks, 2024

Preliminary Performance Evaluation of Grace-Hopper GH200.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
Fast Multiple-Precision Integer Division Using Intel AVX-512.
IEEE Trans. Emerg. Top. Comput., 2023

Asymptotic solutions to a fuzzy elementary cellular automaton of rule number 38.
JSIAM Lett., 2023

Multiple Integer Divisions with an Invariant Dividend and Monotonically Increasing or Decreasing Divisors.
Proceedings of the Computational Science and Its Applications - ICCSA 2023, 2023

Efficient Large Integer Multiplication with Arm SVE Instructions.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2023

2022
Lattice equations and their solutions with complexity of polynomial class.
JSIAM Lett., 2022

Three-dimensional fundamental diagram of particle system of 5 neighbors with two conserved densities.
JSIAM Lett., 2022

An Implementation of Parallel Number-Theoretic Transform Using Intel AVX-512 Instructions.
Proceedings of the Computer Algebra in Scientific Computing - 24th International Workshop, 2022

2021
An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

A Rapid Euclidean Norm Calculation Algorithm that Reduces Overflow and Underflow.
Proceedings of the Computational Science and Its Applications - ICCSA 2021, 2021

2020
Xevolver: A code transformation framework for separation of system-awareness from application codes.
Concurr. Comput. Pract. Exp., 2020

Max-Plus Generalization of Conway's Game of Life.
Complex Syst., 2020

Fast Multiple Montgomery Multiplications Using Intel AVX-512IFMA Instructions.
Proceedings of the Computational Science and Its Applications - ICCSA 2020, 2020

Fast Computation of the Exact Number of Magic Series with an Improved Montgomery Multiplication Algorithm.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2020

FFTE on SVE: SPIRAL-Generated Kernels.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

2019
Reproducibility in Benchmarking Parallel Fast Fourier Transform based Applications.
Proceedings of the Companion of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019

Implementation of Parallel 3-D Real FFT with 2-D Decomposition on Intel Xeon Phi Clusters.
Proceedings of the Parallel Processing and Applied Mathematics, 2019

Accelerating Large Integer Multiplication Using Intel AVX-512IFMA.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2019

Fast Fourier Transform Algorithms for Parallel Computers
High-Performance Computing Series 2, Springer, ISBN: 978-981-13-9965-7, 2019

Fast Fourier Transform in Large-Scale Systems.
Proceedings of the Art of High Performance Computing for Computational Science, 2019

2018
Japanese Autotuning Research: Autotuning Languages and FFT.
Proc. IEEE, 2018

Computation of the 100 quadrillionth hexadecimal digit of <i>π</i> on a cluster of Intel Xeon Phi processors.
Parallel Comput., 2018

Max-plus equation with two conserved quantities and one monotonically decreasing quantity.
JSIAM Lett., 2018

Extended Reproduction of Demonstration Motion Using Variational Autoencoder.
Proceedings of the 27th IEEE International Symposium on Industrial Electronics, 2018

Acceleration of Large Integer Multiplication with Intel AVX-512 Instructions.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

2017
A Customizable Auto-Tuning Scenario with User-Defined Code Transformations.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

An Implementation of Parallel 1-D Real FFT on Intel Xeon Phi Processors.
Proceedings of the Computational Science and Its Applications - ICCSA 2017, 2017

2016
Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs.
Proceedings of the 10th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2016

Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors.
Proceedings of the Computational Science and Its Applications - ICCSA 2016, 2016

Parallel Sparse Matrix-Vector Multiplication Using Accelerators.
Proceedings of the Computational Science and Its Applications - ICCSA 2016, 2016

Automatic Tuning of Computation-Communication Overlap for Parallel 1-D FFT.
Proceedings of the 2016 IEEE Intl Conference on Computational Science and Engineering, 2016

2015
Fast Implementation of General Matrix-Vector Multiplication (GEMV) on Kepler GPUs.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Performance Evaluation of Sparse Matrix-Vector Multiplication Using GPU/MIC Cluster.
Proceedings of the Third International Symposium on Computing and Networking, 2015

2014
Virtual flow-net for accountability and forensics of computer and network systems.
Secur. Commun. Networks, 2014

Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
J. Comput. Chem., 2014

Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer.
Int. J. High Perform. Comput. Appl., 2014

A study on application-aware power-saving control method for sensor stations in home gateway.
Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, 2014

A study on application-aware QoS control in OSGi based home gateway.
Proceedings of the IEEE 3rd Global Conference on Consumer Electronics, 2014

2013
Highly scalable implementation of an <i>N</i>N-body code on a GPU cluster.
Comput. Phys. Commun., 2013

Using Quadruple Precision Arithmetic to Accelerate Krylov Subspace Methods on GPUs.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Optimization of Sparse Matrix-Vector Multiplication for CRS Format on NVIDIA Kepler Architecture GPUs.
Proceedings of the Computational Science and Its Applications - ICCSA 2013, 2013

Efficient Hybrid Breadth-First Search on GPUs.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

A study on OSGi based home gateway employing application-aware QoS control.
Proceedings of the IEEE 2nd Global Conference on Consumer Electronics, 2013

Implementation of Parallel 1-D FFT on GPU Clusters.
Proceedings of the 16th IEEE International Conference on Computational Science and Engineering, 2013

Optimizing Objective Function Parameters for Strength in Computer Game-Playing.
Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013

2012
Accountability using flow-net: design, implementation, and performance evaluation.
Secur. Commun. Networks, 2012

A Fast Implementation and Performance Analysis of Collisionless N-body Code Based on GPGPU.
Proceedings of the International Conference on Computational Science, 2012

Implementation of XcalableMP Device Acceleration Extention with OpenCL.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Implementation and Evaluation of Triple Precision BLAS Subroutines on GPUs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

An Implementation of Parallel 2-D FFT Using Intel AVX Instructions on Multi-core Processors.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2012

An Implementation of Parallel 1-D FFT on the K Computer.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Automatic Tuning of Sparse Matrix-Vector Multiplication for CRS Format on GPUs.
Proceedings of the 15th IEEE International Conference on Computational Science and Engineering, 2012

2011
Wireless telemedicine and m-health: technologies, applications and research issues.
Int. J. Sens. Networks, 2011

First-principles calculations of electron states of a silicon nanowire with 100, 000 atoms on the K computer.
Proceedings of the Conference on High Performance Computing Networking, 2011

Optimization of Sparse Matrix-Vector Multiplication by Auto Selecting Storage Schemes on GPU.
Proceedings of the Computational Science and Its Applications - ICCSA 2011, 2011

2010
Parallel implementation of multiple-precision arithmetic and 2, 576, 980, 370, 000 decimal digits of pi calculation.
Parallel Comput., 2010

A massively-parallel electronic-structure calculations based on real-space density functional theory.
J. Comput. Phys., 2010

A Shogi Program Based on Monte-Carlo Tree Search.
J. Int. Comput. Games Assoc., 2010

IEEE 802.11 user fingerprinting and its applications for intrusion detection.
Comput. Math. Appl., 2010

Implementation and Evaluation of Quadruple Precision BLAS Functions on GPUs.
Proceedings of the Applied Parallel and Scientific Computing, 2010

Automatic Tuning for Parallel FFTs.
Proceedings of the Software Automatic Tuning, From Concepts to State-of-the-Art Results, 2010

2009
On a discrete optimal velocity model and its continuous and ultradiscrete relatives.
JSIAM Lett., 2009

An Implementation of Parallel 3-D FFT with 2-D Decomposition on a Massively Parallel Cluster of Multi-core Processors.
Proceedings of the Parallel Processing and Applied Mathematics, 2009

2008
Retrieving knowledge from auditing log-files for computer and network forensics and accountability.
Secur. Commun. Networks, 2008

A parallel method for large sparse generalized eigenvalue problems using a GridRPC system.
Future Gener. Comput. Syst., 2008

Temperature-Aware Routing for Telemedicine Applications in Embedded Biomedical Sensor Networks.
EURASIP J. Wirel. Commun. Netw., 2008

On-Demand Anonymous Routing with Distance Vector Protecting Traffic Privacy in Wireless Multi-hop Networks.
Proceedings of the MSN 2008, 2008

Complexity Analysis of Retrieving Knowledge from Auditing Log Files for Computer and Network Forensics and Accountability.
Proceedings of IEEE International Conference on Communications, 2008

2007
Telemedicine Usage and Potentials.
Proceedings of the IEEE Wireless Communications and Networking Conference, 2007

A Parallel Algorithm for Multiple-Precision Division by a Single-Precision Integer.
Proceedings of the Large-Scale Scientific Computing, 6th International Conference, 2007

RI2N/UDP: High bandwidth and fault-tolerant network for a PC-cluster based on multi-link Ethernet.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High Performance FFT on SGI Altix 3700.
Proceedings of the High Performance Computing and Communications, 2007

LTRT: Least Total-Route Temperature Routing for Embedded Biomedical Sensor Networks.
Proceedings of the Global Communications Conference, 2007

2006
S12 - The HPC Challenge (HPCC) benchmark suite.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

An Implementation of Parallel 1-D FFT Using SSE3 Instructions on Dual-Core Processors.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Profile-based optimization of power performance by using dynamic voltage scaling on a PC cluster.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

MegaProto/E: power-aware high-performance cluster with commodity technology.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Performance Improvement by Data Management Layer in a Grid RPC System.
Proceedings of the Advances in Grid and Pervasive Computing, 2006

Emprical study on Reducing Energy of Parallel Programs using Slack Reclamation by DVFS in a Power-scalable High Performance Cluster.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

PACS-CS: A Large-Scale Bandwidth-Aware PC Cluster for Scientific Computations.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Robust Posture Estimation of the Human Face in Rapid Lighting Changes using a 3-D Reference Picture.
Proceedings of the Canadian Conference on Electrical and Computer Engineering, 2006

2005
An algorithm for multiple-precision floating-point multiplication.
Appl. Math. Comput., 2005

MegaProto: 1 TFlops/10kW Rack Is Feasible Even with Only Commodity Technology.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

A Hybrid MPI/OpenMP Implementation of a Parallel 3-D FFT on SMP Clusters.
Proceedings of the Parallel Processing and Applied Mathematics, 2005

Computation of High-Precision Mathematical Constants in a Combined Cluster and Grid Environment.
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005

Design of a Software Distributed Shared Memory System using an MPI communication layer.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Low-cost High-bandwidth Tree Network for PC Clusters based on Tagged-VLAN Technology.
Proceedings of the 8th International Symposium on Parallel Architectures, 2005

Empirical Study for Optimization of Power-Performance with On-Chip Memory.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

MegaProto: A Low-Power and Compact Cluster for High-Performance Computing.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Grid Environment for Computational Astrophysics Driven by GRAPE-6 with HMCS-G and OmniRPC.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Low Temperature Limit of Equations - Hidden Discrete Structure.
Proceedings of the CCA 2005, 2005

2004
A stochastic model for solitons.
Random Struct. Algorithms, 2004

SCIMA-SMP: on-chip memory processor architecture for SMP.
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Heterogeneous Remote Computing System for Computational Astrophysics with OmniRPC.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

Performance Evaluation of OmniRPC in a Grid Environment.
Proceedings of the 2004 Symposium on Applications and the Internet Workshops (SAINT 2004 Workshops), 2004

An Implementation of Parallel 3-D FFT Using Short Vector SIMD Instructions on Clusters of PCs.
Proceedings of the Applied Parallel Computing, 2004

A Parallel Method for Large Sparse Generalized Eigenvalue Problems by OmniRPC in a Grid Environment.
Proceedings of the Applied Parallel Computing, 2004

Parallel Implementation of Strassen's Matrix Multiplication Algorithm for Heterogeneous Clusters.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Implementation and performance evaluation of CONFLEX-G: grid-enabled molecular conformational space search program with OmniRPC.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Formation of Dwarf Galaxies in Reionized Universe with Heterogeneous Multi-computer System.
Proceedings of the Computational Science, 2004

2003
A parallel 1-D FFT algorithm for the Hitachi SR8000.
Parallel Comput., 2003

Performance Evaluation of the Hitachi SR8000 Using SPEC OMP2001 Benchmarks.
Int. J. Parallel Program., 2003

An OpenMP Implementation of Parallel FFT and Its Performance on IA-64 Processors.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

RI2N - Interconnection Network System for Clusters with Wide-Bandwidth and Fault-Tolerancy Based on Multiple Links.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

OmniRPC: a Grid RPC ystem for Parallel Programming in Cluster and Grid Environment.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

HMCS-G: Grid-enabled Hybrid Computing System for Computational Astrophysics.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers.
Proceedings of the Applied Parallel Computing Advanced Scientific Computing, 2002

Performance Evaluation of the Hitachi SR8000 Using OpenMP Benchmarks.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs.
Proceedings of the Euro-Par 2002, 2002

2001
An extended split-radix FFT algorithm.
IEEE Signal Process. Lett., 2001

A Mixed-Radix Parallel Three-Dimensional FFT Algorithm on Clusters of Vector SMPs.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

A Blocking Algorithm for FFT on Cache-Based Processors.
Proceedings of the High-Performance Computing and Networking, 9th International Conference, 2001

2000
High-Performance Radix-2, 3 and 5 Parallel 1-D Complex FFT Algorithms for Distributed-Memory Parallel Computers.
J. Supercomput., 2000

A fast algorithm for computing large Fibonacci numbers.
Inf. Process. Lett., 2000

A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs.
Proceedings of the Applied Parallel Computing, 2000

A Performance Study on a Single Processing Node of the HITACHI SR8000.
Proceedings of the Numerical Analysis and Its Applications, 2000

Implementation of Multiple-Precision Parallel Division and Square Root on Distributed-Memory Parallel Computers.
Proceedings of the 2000 International Workshop on Parallel Processing, 2000

A new radix-6 FFT algorithm suitable for multiply-add instruction.
Proceedings of the IEEE International Conference on Acoustics, 2000

1999
Fast High-Precision Arithmetic on Distributed Memory Parallel Machines.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999


  Loading...