Jack J. Dongarra
Orcid: 0000-0003-3247-1782Affiliations:
- University of Tennessee, Knoxville, TN, USA
- Oak Ridge National Laboratory, TN, USA
- University of Manchester, Manchester, UK
According to our database1,
Jack J. Dongarra
authored at least 809 papers
between 1976 and 2024.
Collaborative distances:
Collaborative distances:
Awards
Turing Prize recipient
Turing Prize 2021, "For pioneering contributions to numerical algorithms and libraries that enabled high performance computational software to keep pace with exponential hardware improvements for over four decades.".
ACM Fellow
ACM Fellow 2001, "For contributions in the field of scientific computing, the development of mathematical software, parallel methods, and enabling technologies for high-performance computing.".
IEEE Fellow
IEEE Fellow 2000, "For contributions and leadership in the field of computational mathematics.".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on scopus.com
-
on acm.org
-
on viaf.org
-
on orcid.org
-
on id.loc.gov
-
on d-nb.info
-
on netlib.org
-
on isni.org
-
on dl.acm.org
On csauthors.net:
Bibliography
2024
Numerical eigen-spectrum slicing, accurate orthogonal eigen-basis, and mixed-precision eigenvalue refinement using OpenMP data-dependent tasks and accelerator offload.
Int. J. High Perform. Comput. Appl., 2024
XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing.
CoRR, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Trends in Computational Science: Natural Language Processing and Network Analysis of 23 Years of ICCS Publications.
Proceedings of the Computational Science - ICCS 2024, 2024
2023
Combining multitask and transfer learning with deep Gaussian processes for autotuning-based performance engineering.
Int. J. High Perform. Comput. Appl., July, 2023
Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software.
CoRR, 2023
Task-Based Polar Decomposition Using SLATE on Massively Parallel Systems with Hardware Accelerators.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
2022
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.
Dataset, August, 2022
Reproducability Artifact for Running SLATE's GEMM and POTRF Operations on Summit and Crusher.
Dataset, August, 2022
IEEE Trans. Parallel Distributed Syst., 2022
IEEE Trans. Parallel Distributed Syst., 2022
Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach With PaRSEC.
IEEE Trans. Parallel Distributed Syst., 2022
Int. J. Netw. Comput., 2022
Mixed-Precision Algorithm for Finding Selected Eigenvalues and Eigenvectors of Symmetric and Hermitian Matrices<sup>1</sup>.
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022
Proceedings of the IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Heterogeneous Systems, 2022
Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers, 2022
High-Performance GMRES Multi-Precision Benchmark: Design, Performance, and Challenges.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022
Proceedings of the IEEE/ACM International Workshop on Performance, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the Computational Science - ICCS 2022, 2022
Deep Gaussian process with multitask and transfer learning for performance optimization.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
Proceedings of the 24th IEEE Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, 2022
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
2021
ACM Trans. Math. Softw., 2021
20 years of computational science: Selected papers from 2020 International Conference on Computational Science.
J. Comput. Sci., 2021
Int. J. High Perform. Comput. Appl., 2021
Int. J. High Perform. Comput. Appl., 2021
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems.
IEEE Access, 2021
Proceedings of the Parallel Computing Technologies, 2021
Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the Workshop on Exascale MPI, 2021
2020
Software for Linear Algebra Targeting Exascale (SLATE) with a Recursive Butterfly Transform based solver.
Dataset, August, 2020
ACM Trans. Parallel Comput., 2020
Matrix multiplication on batches of small matrices in half and half-complex precisions.
J. Parallel Distributed Comput., 2020
Computational Science in the Interconnected World: Selected papers from 2019 International Conference on Computational Science.
J. Comput. Sci., 2020
Int. J. High Perform. Comput. Appl., 2020
Concurr. Comput. Pract. Exp., 2020
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020
Evaluating the Performance of NVIDIA's A100 Ampere GPU for Sparse and Batched Computations.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020
Extreme-Scale Task-Based Cholesky Factorization Toward Climate and Weather Prediction Applications.
Proceedings of the PASC '20: Platform for Advanced Scientific Computing Conference, Geneva, Switzerland, June 29, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the Computational Science - ICCS 2020, 2020
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs.
Proceedings of the Computational Science - ICCS 2020, 2020
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
Proceedings of the IEEE International Conference on Cluster Computing, 2020
Proceedings of the IEEE International Conference on Cluster Computing, 2020
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
ACM Trans. Math. Softw., 2019
Parallel Comput., 2019
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019
Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms.
Parallel Comput., 2019
Variable-size batched Gauss-Jordan elimination for block-Jacobi preconditioning on graphics processors.
Parallel Comput., 2019
J. Comput. Sci., 2019
Int. J. Netw. Comput., 2019
Int. J. High Perform. Comput. Netw., 2019
Int. J. High Perform. Comput. Appl., 2019
Int. J. High Perform. Comput. Appl., 2019
Concurr. Comput. Pract. Exp., 2019
Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers.
Concurr. Comput. Pract. Exp., 2019
Hands-On Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments.
Proceedings of the High Performance Computing, 2019
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing.
Proceedings of the High Performance Computing, 2019
Evaluation of Programming Models to Address Load Imbalance on Distributed Multi-Core CPUs: A Case Study with Block Low-Rank Factorization.
Proceedings of the 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI, 2019
Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019
Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019
Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2019
Characterization of Power Usage and Performance in Data-Intensive Applications Using MapReduce over MPI.
Proceedings of the Parallel Computing: Technology Trends, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the ACM International Conference on Supercomputing, 2019
Proceedings of the 48th International Conference on Parallel Processing, 2019
Increasing Accuracy of Iterative Refinement in Limited Floating-Point Arithmetic on Half-Precision Accelerators.
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
Proceedings of the Euro-Par 2019: Parallel Processing, 2019
2018
IEEE Trans. Parallel Distributed Syst., 2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018
Supercomput. Front. Innov., 2018
Supercomput. Front. Innov., 2018
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018
Autotuning Numerical Dense Linear Algebra for Batched Computation With GPU Hardware Accelerators.
Proc. IEEE, 2018
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs.
Parallel Comput., 2018
Parallel Comput., 2018
Frontiers Inf. Technol. Electron. Eng., 2018
Using Jacobi iterations and blocking for solving sparse triangular systems in incomplete factorization preconditioning.
J. Parallel Distributed Comput., 2018
J. Comput. Sci., 2018
J. Comput. Sci., 2018
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018
Int. J. High Perform. Comput. Appl., 2018
Int. J. High Perform. Comput. Appl., 2018
Int. J. Comput. Sci. Eng., 2018
SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks.
CoRR, 2018
Concurr. Comput. Pract. Exp., 2018
The 30th Anniversary of the Supercomputing Conference: Bringing the Future Closer - Supercomputing History and the Immortality of Now.
Computer, 2018
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018
2017
Design and Implementation of the PULSAR Programming System for Large Scale Computing.
Supercomput. Front. Innov., 2017
J. Comput. Sci., 2017
Int. J. Parallel Program., 2017
Guest Editor's Note: Special Issue on Clusters, Clouds and Data for Scientific Computing.
Int. J. High Perform. Comput. Appl., 2017
Int. J. High Perform. Comput. Appl., 2017
Int. J. High Perform. Comput. Appl., 2017
IEEE Embed. Syst. Lett., 2017
Concurr. Comput. Pract. Exp., 2017
Concurr. Comput. Pract. Exp., 2017
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the General Purpose GPUs, 2017
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017
Improving Performance of GMRES by Reducing Communication and Pipelining Global Collectives.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Bidiagonalization and R-Bidiagonalization: Parallel Tiled Algorithms, Critical Paths and Distributed-Memory Implementation.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017
Variable-Size Batched LU for Small Matrices and Its Integration into Block-Jacobi Preconditioning.
Proceedings of the 46th International Conference on Parallel Processing, 2017
The Art of Computational Science, Bridging Gaps - Forming Alloys. Preface for ICCS 2017.
Proceedings of the International Conference on Computational Science, 2017
The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the International Conference on Computational Science, 2017
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
Scaling point set registration in 3D across thread counts on multicore and hardware accelerator platforms through autotuning for large scale analysis of scientific point clouds.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
Proceedings of the Handbook of Big Data Technologies, 2017
2016
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016
Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs.
IEEE Trans. Parallel Distributed Syst., 2016
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU.
ACM Trans. Math. Softw., 2016
Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results.
Parallel Comput., 2016
Numer. Algorithms, 2016
High-performance conjugate-gradient benchmark: A new metric for ranking high-performance computing systems.
Int. J. High Perform. Comput. Appl., 2016
Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs.
Concurr. Comput. Pract. Exp., 2016
Acta Numer., 2016
Proceedings of the High Performance Computing for Computational Science - VECPAR 2016, 2016
Proceedings of the High Performance Computing, 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
Performance-Portable Autotuning of OpenCL Kernels for Convolutional Layers of Deep Neural Networks.
Proceedings of the 2nd Workshop on Machine Learning in HPC Environments, 2016
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Hessenberg Reduction with Transient Error Resilience on GPU-Based Hybrid Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Proceedings of the International Conference on Computational Science 2016, 2016
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016
Proceedings of the International Conference on Computational Science 2016, 2016
LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
2015
Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.
ACM Trans. Parallel Comput., 2015
Supercomput. Front. Innov., 2015
Supercomput. Front. Innov., 2015
Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations.
Sci. Program., 2015
Sci. Program., 2015
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs.
SIAM J. Sci. Comput., 2015
Guest Editors' Note: Special Issue on Clusters, Clouds and Data for Scientific Computing.
Parallel Process. Lett., 2015
Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers.
J. Parallel Distributed Comput., 2015
Int. J. Netw. Comput., 2015
Int. J. High Perform. Comput. Appl., 2015
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems.
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
Concurr. Comput. Pract. Exp., 2015
On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors.
Proceedings of the High Performance Computing - 30th International Conference, 2015
A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015
Proceedings of the High Performance Computing - 30th International Conference, 2015
Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures.
Proceedings of the Symposium on High Performance Computing, 2015
Proceedings of the Symposium on High Performance Computing, 2015
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster.
Proceedings of the International Conference for High Performance Computing, 2015
Efficient implementation of quantum materials simulations on distributed CPU-GPU systems.
Proceedings of the International Conference for High Performance Computing, 2015
Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015
Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
GPU-accelerated co-design of induced dimension reduction: algorithmic fusion and kernel overlap.
Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing, 2015
Proceedings of the 2015 14th RoEduNet International Conference, 2015
Proceedings of the 22nd European MPI Users' Group Meeting, 2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform.
Proceedings of the International Conference on Computational Science, 2015
MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
Proceedings of the Euro-Par 2015: Parallel Processing, 2015
PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
2014
Supercomput. Front. Innov., 2014
SIAM J. Matrix Anal. Appl., 2014
Design and Implementation of a Large Scale Tree-Based QR Decomposition Using a 3D Virtual Systolic Array and a Lightweight Runtime.
Parallel Process. Lett., 2014
An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems.
Parallel Comput., 2014
Int. J. Netw. Comput., 2014
A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks.
Int. J. High Perform. Comput. Appl., 2014
Comput. Sci. Res. Dev., 2014
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems.
Concurr. Comput. Pract. Exp., 2014
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting.
Concurr. Comput. Pract. Exp., 2014
Concurr. Comput. Pract. Exp., 2014
Comput. J., 2014
Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Accelerating Computation of Eigenvectors in the Dense Nonsymmetric Eigenvalue Problem.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014
Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Proceedings of the International Workshop on OpenCL, 2014
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores.
Proceedings of the 2014 International Conference on Supercomputing, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the International Conference on Computational Science, 2014
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014
Power monitoring with PAPI for extreme scale architectures and dataflow-based programming models.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
IEEE Trans. Parallel Distributed Syst., 2013
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures.
ACM Trans. Math. Softw., 2013
Level-3 Cholesky Factorization Routines Improve Performance of Many Cholesky Algorithms.
ACM Trans. Math. Softw., 2013
ACM Trans. Math. Softw., 2013
J. Supercomput., 2013
Parallel Comput., 2013
Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms.
J. Parallel Distributed Comput., 2013
J. Parallel Distributed Comput., 2013
J. Comput. Sci., 2013
Int. J. High Perform. Comput. Appl., 2013
Int. J. High Perform. Comput. Appl., 2013
Correlated set coordination in fault tolerant message logging protocols for many-core clusters.
Concurr. Comput. Pract. Exp., 2013
Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.
Concurr. Comput. Pract. Exp., 2013
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures.
Proceedings of the IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, 2013
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the International Conference on Computational Science, 2013
Proceedings of the International Conference on Computational Science, 2013
Proceedings of the IEEE High Performance Extreme Computing Conference, 2013
Proceedings of the Euro-Par 2013 Parallel Processing, 2013
Implementing a Systolic Algorithm for QR Factorization on Multicore Clusters with PaRSEC.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013
2012
IEEE Trans. Parallel Distributed Syst., 2012
SIGMETRICS Perform. Evaluation Rev., 2012
SIAM J. Sci. Comput., 2012
Toward a High Performance Tile Divide and Conquer Algorithm for the Dense Symmetric Eigenvalue Problem.
SIAM J. Sci. Comput., 2012
Proceedings of the International Conference on Computational Science, 2012
Proceedings of the International Conference on Computational Science, 2012
A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines.
Proceedings of the International Conference on Computational Science, 2012
Proceedings of the International Conference on Computational Science, 2012
Proceedings of the International Conference on Computational Science, 2012
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming.
Parallel Comput., 2012
Parallel Comput., 2012
Profiling high performance dense linear algebra algorithms on multicore architectures for power and energy efficiency.
Comput. Sci. Res. Dev., 2012
Proceedings of the High Performance Computing for Computational Science, 2012
Proceedings of the High Performance Computing for Computational Science, 2012
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012
Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012
HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
A Comprehensive Study of Task Coalescing for Selecting Parallelism Granularity in a Two-Stage Bidiagonal Reduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
A Parallel Tiled Solver for Dense Symmetric Indefinite Systems on Multicore Architectures.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems.
Proceedings of the International Conference on Supercomputing, 2012
Proceedings of the IEEE Conference on High Performance Extreme Computing, 2012
Proceedings of the Transition of HPC Towards Exascale Computing, 2012
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
GPU-Accelerated Asynchronous Error Correction for Mixed Precision Iterative Refinement.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012
Energy Footprint of Advanced Dense Numerical Linear Algebra Using Tile Algorithms on Multicore Architectures.
Proceedings of the 2012 Second International Conference on Cloud and Green Computing, 2012
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012
2011
J. Comput. Phys., 2011
Int. J. High Perform. Comput. Appl., 2011
Selected papers of the Workshop on Clusters, Clouds and Grids for Scientific Computing (CCGSC).
Int. J. High Perform. Comput. Appl., 2011
Int. J. High Perform. Comput. Appl., 2011
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community.
Comput. Sci. Eng., 2011
Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures.
Concurr. Comput. Pract. Exp., 2011
Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011
Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels.
Proceedings of the Conference on High Performance Computing Networking, 2011
High performance matrix inversion based on LU factorization for multicore architectures.
Proceedings of the 2011 ACM International Workshop on Many Task Computing on Grids and Supercomputers, 2011
Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW.
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Reducing the Time to Tune Parallel Dense Linear Algebra Routines with Partial Execution and Performance Modeling.
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction.
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Proceedings of the Parallel Processing and Applied Mathematics, 2011
Solving the Generalized Symmetric Eigenvalue Problem using Tile Algorithms on Multicore Architectures.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011
Overlapping Computation and Communication for Advection on Hybrid Parallel Computers.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Two-Stage Tridiagonal Reduction for Dense Symmetric Matrices Using Tile Algorithms on Multicore Architectures.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs.
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011
2010
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2010
Rectangular full packed format for cholesky's algorithm: factorization, solution, and inversion.
ACM Trans. Math. Softw., 2010
Scheduling two-sided transformations using tile algorithms on multicore architectures.
Sci. Program., 2010
Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI.
Proceedings of the International Conference on Computational Science, 2010
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing.
Parallel Comput., 2010
Parallel Comput., 2010
Int. J. High Perform. Comput. Appl., 2010
Future Gener. Comput. Syst., 2010
Concurr. Comput. Pract. Exp., 2010
Concurr. Comput. Pract. Exp., 2010
Concurr. Comput. Pract. Exp., 2010
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
Proceedings of the Conference on High Performance Computing Networking, 2010
Proceedings of the Recent Advances in the Message Passing Interface, 2010
Proceedings of the Recent Advances in the Message Passing Interface, 2010
Proceedings of the Applied Parallel and Scientific Computing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 39th International Conference on Parallel Processing, 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
2009
IEEE Trans. Computers, 2009
Optimizing matrix multiplication for a short-vector SIMD architecture - CELL processor.
Parallel Comput., 2009
Parallel Comput., 2009
Numer. Linear Algebra Appl., 2009
J. Parallel Distributed Comput., 2009
Int. J. High Perform. Comput. Appl., 2009
The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community.
Int. J. High Perform. Comput. Appl., 2009
Comput. Phys. Commun., 2009
Paravirtualization effect on single- and multi-threaded memory-intensive linear algebra software.
Clust. Comput., 2009
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009
Proceedings of the Tools for High Performance Computing 2009, 2009
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009
Proceedings of the ICPP 2009, 2009
Proceedings of the Computational Science, 2009
A Holistic Approach for Performance Measurement and Analysis for Petascale Applications.
Proceedings of the Computational Science, 2009
Analytical modeling and optimization for affinity based thread scheduling on multicore systems.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
Proceedings of the Birth of Numerical Analysis, 2009
Wiley series on parallel and distributed computing, Wiley, ISBN: 978-0-470-04039-3, 2009
2008
Solving Systems of Linear Equations on the CELL Processor Using Cholesky Factorization.
IEEE Trans. Parallel Distributed Syst., 2008
IEEE Trans. Parallel Distributed Syst., 2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008
State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems.
J. Comput. Phys., 2008
Int. J. Found. Comput. Sci., 2008
Future Gener. Comput. Syst., 2008
Future Gener. Comput. Syst., 2008
Future Gener. Comput. Syst., 2008
Concurr. Comput. Pract. Exp., 2008
IEEE Ann. Hist. Comput., 2008
DARPA's HPCS Program- History, Models, Tools, Languages.
Adv. Comput., 2008
Proceedings of the High Performance Computing for Computational Science, 2008
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Fast and Small Short Vector SIMD Matrix Multiplication Kernels for the Synergistic Processing Element of the CELL Processor.
Proceedings of the Computational Science, 2008
The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008
Proceedings of the 11th IEEE High Assurance Systems Engineering Symposium, 2008
Proceedings of the Seventh International Conference on Grid and Cooperative Computing, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
2007
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007
SIAM J. Sci. Comput., 2007
Improved Runtime and Transfer Time Prediction Mechanisms in a Network Enabled Servers Middleware.
Parallel Process. Lett., 2007
The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot.
J. Comput. Phys., 2007
High Performance Development for High End Computing With Python Language Wrapper (PLW).
Int. J. High Perform. Comput. Appl., 2007
Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems.
Int. J. High Perform. Comput. Appl., 2007
Concurr. Comput. Pract. Exp., 2007
Implementation of mixed precision in solving systems of linear equations on the Cell processor.
Concurr. Comput. Pract. Exp., 2007
Editorial introduction to the special issue on computational linear algebra and sparse matrix computations.
Appl. Algebra Eng. Commun. Comput., 2007
Bi-objective scheduling algorithms for optimizing makespan and reliability on heterogeneous systems.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007
Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007
Proceedings of the Eighth International Conference on Parallel and Distributed Computing, 2007
Proceedings of the On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, 2007
Proceedings of the Parallel and Distributed Processing and Applications, 2007
Self Adaptive Application Level Fault Tolerance for Parallel and Distributed Computing.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007
Scalability Analysis of the SPEC OpenMP Benchmarks on Large-Scale Shared Memory Multiprocessors.
Proceedings of the Computational Science - ICCS 2007, 7th International Conference, Beijing, China, May 27, 2007
Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 2007
Proceedings of the Euro-Par 2007, 2007
On Using Incremental Profiling for the Performance Analysis of Shared Memory Parallel Applications.
Proceedings of the Euro-Par 2007, 2007
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007
2006
Int. J. High Perform. Comput. Appl., 2006
Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures.
Int. J. Comput. Sci. Eng., 2006
Future Gener. Comput. Syst., 2006
Future Gener. Comput. Syst., 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Tools and techniques for performance - Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems).
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Poster reception - Targeting multi-core architectures for linear algebra applications.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006
Implementing Linear Algebra Routines on Multi-core Processors with Pipelining and a Look Ahead.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the Grid-Based Problem Solving Environments, 2006
The Impact of Multicore on Math Software and Exploiting Single Precision Computing to Obtain Double Precision Results.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006
Proceedings of the Parallel Processing for Scientific Computing, 2006
Proceedings of the Handbook of Nature-Inspired and Innovative Computing, 2006
Engineering the grid - status and perspective.
American Scientific Publishers, ISBN: 978-1-58883-038-8, 2006
2005
Proc. IEEE, 2005
Parallel Comput., 2005
Int. J. Parallel Program., 2005
Int. J. Parallel Program., 2005
Int. J. High Perform. Comput. Appl., 2005
Int. J. High Perform. Comput. Appl., 2005
Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing.
Int. J. High Perform. Comput. Appl., 2005
Future Gener. Comput. Syst., 2005
Comput. Sci. Eng., 2005
Concurr. Pract. Exp., 2005
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005
Proceedings of the Large-Scale Scientific Computing, 5th International Conference, 2005
NetSolve/D: A Massively Parallel Grid Execution System for Scalable Data Intensive Collaboration.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005
Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures.
Proceedings of the Computational Science, 2005
Proceedings of the Computational Science, 2005
Processes Distribution of Homogeneous Parallel Linear Algebra Routines on Heterogeneous Clusters.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005
2004
GrADSolve a grid-based RPC system for parallel computing with application-level scheduling.
J. Parallel Distributed Comput., 2004
Int. J. High Perform. Comput. Appl., 2004
Int. J. High Perform. Comput. Appl., 2004
Int. J. High Perform. Comput. Appl., 2004
TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Proceedings of the Parallel and Distributed Processing and Applications, 2004
Improvements in the Efficient Composition of Applications Built Using a Component-Based Programming Environment.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004
Design of Interactive Environment for Numerically Intensive Parallel Linear Algebra Calculations.
Proceedings of the Computational Science, 2004
Proceedings of the Computational Science, 2004
Proceedings of the Grid Computing: The New Frontier of High Performance Computing [post-proceedings of the High Performance Computing Workshop, 2004
Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS-37 2004), 2004
Proceedings of the Grid and Cooperative Computing, 2004
Proceedings of the Euro-Par 2004 Parallel Processing, 2004
Proceedings of the 2004 workshop on Memory System Performance, 2004
Proceedings of the Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition, 2004
2003
SRS: A Framework for Developing Malleable and Migratable Parallel Applications for Distributed Systems.
Parallel Process. Lett., 2003
Parallel Comput., 2003
Recent Advances in Parallel Virtual Machine and Message Passing Interface: (Selected Papers from the EuroPVMMPI 2002 Conference).
Int. J. High Perform. Comput. Appl., 2003
Int. J. High Perform. Comput. Appl., 2003
Evaluating the Performance of MPI-2 Dynamic Communicators and One-Sided Communication.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load.
Proceedings of the 11th Euromicro Workshop on Parallel, 2003
Proceedings of the High Performance Computing, 5th International Symposium, 2003
Optimizing Performance and Reliability in Distributed Computing Systems through Wide Spectrum Storage.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the Computational Science - ICCS 2003, 2003
Proceedings of the Computational Science - ICCS 2003, 2003
Proceedings of the Computational Science - ICCS 2003, 2003
Proceedings of the Computational Science - ICCS 2003, 2003
Proceedings of the Genetic and Evolutionary Computation, 2003
Proceedings of the Euro-Par 2003. Parallel Processing, 2003
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003
2002
ACM Trans. Math. Softw., 2002
A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures.
SIAM J. Sci. Comput., 2002
Active Netlib: An Active Mathematical Software Collection for Inquiry-based Computational Science and Engineering Education.
J. Digit. Inf., 2002
Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (2).
Int. J. High Perform. Comput. Appl., 2002
Basic Linear Algebra Subprograms Technical (Blast) Forum Standard (1).
Int. J. High Perform. Comput. Appl., 2002
Future Gener. Comput. Syst., 2002
Concurr. Comput. Pract. Exp., 2002
Concurr. Comput. Pract. Exp., 2002
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 9th European PVM/MPI Users' Group Meeting, Linz, Austria, September 29, 2002
Active netlib: an active mathematical software collection for inquiry-based computational science & engineering education.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2002
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002
Proceedings of the Grid Computing, 2002
Proceedings of the Grid Computing, 2002
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002
Three Tools to Help with Cluster and Grid Computing: SANS-Effort, PAPI, and NetSolve.
Proceedings of the 2nd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), 2002
2001
Preface: Clusters and Computational Grids for Scientific Computing.
Parallel Process. Lett., 2001
Parallel Comput., 2001
Parallel Comput., 2001
Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries.
J. Parallel Distributed Comput., 2001
Int. J. High Perform. Comput. Appl., 2001
Int. J. High Perform. Comput. Appl., 2001
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2001
Parallel IO Support for Meta-computing Applications: MPI_Connect IO Applied to PACX-MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2001
Packed Storage Extension for ScaLAPACK.
Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001
Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2001), 2001
Logistical Computing and Internetworking: Middleware for the Use of Storage in Communication.
Proceedings of the 3rd Annual International Workshop on Active Middleware Services (AMS 2001), 2001
Proceedings of the Computational Science - ICCS 2001, 2001
High Performance Computing and Trends: Connecting Computational Requirements with Computing Resources.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
High Performance Computing and Trends: Connected Computational Requirements with Computing Resources.
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001
End-user Tools for Application Performance Analysis Using Hardware Counters.
Proceedings of the ISCA 14th International Conference on Parallel and Distributed Computing Systems, 2001
Software, environments, tools 13, SIAM, ISBN: 978-0-89871-504-0, 2001
2000
Int. J. High Perform. Comput. Appl., 2000
The design and implementation of the parallel out-of-core ScaLAPACK LU, QR, and Cholesky factorization routines.
Concurr. Pract. Exp., 2000
Proceedings of the Proceedings Supercomputing 2000, 2000
A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters.
Proceedings of the Proceedings Supercomputing 2000, 2000
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2000
Developing an Architecture to Support the Implementation and Development of Scientific computing Applications.
Proceedings of the Architecture of Scientific Software, 2000
Proceedings of the 2000 International Workshop on Parallel Processing, 2000
A Grid Computing Environment for Enabling Large Scale Quantum Mechanical Simulations.
Proceedings of the Grid Computing, 2000
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
Proceedings of the Handbook on Parallel and Distributed Processing, 2000
Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000
1999
IEEE Trans. Parallel Distributed Syst., 1999
A Parallel Divide and Conquer Algorithm for the Symmetric Eigenvalue Problem on Distributed Memory Architectures.
SIAM J. Sci. Comput., 1999
Parallel Distributed Comput. Pract., 1999
A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems.
Parallel Distributed Comput. Pract., 1999
Parallel Process. Lett., 1999
Stochastic Performance Prediction for Iterative Algorithms in Distributed Environments.
J. Parallel Distributed Comput., 1999
Int. J. High Perform. Comput. Appl., 1999
Int. J. High Perform. Comput. Appl., 1999
Future Gener. Comput. Syst., 1999
Future Gener. Comput. Syst., 1999
Future Gener. Comput. Syst., 1999
The Future of the BLAS.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999
A Comparison of Parallel Solvers for Diagonally Dominant and General Narrow-Banded Linear Systems II.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999
Software, Environments and Tools, SIAM, ISBN: 978-0-89871-960-4, 1999
1998
IEEE Trans. Parallel Distributed Syst., 1998
Parallel Comput., 1998
National HPCC Software Exchange (NHSE): Uniting the High Performance Computing and Communications Community.
D Lib Mag., 1998
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998
MPI_Connect Managing Heterogeneous MPI Applications Ineroperation and Process Control.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1998
Proceedings of the Applied Parallel Computing, 1998
Proceedings of the Applied Parallel Computing, 1998
Proceedings of the Languages, 1998
Dynamic Reconfiguration and Virtual Machine Management in the Harness Metacomputing System.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1998
Proceedings of the Parallel and Distributed Processing, 10 IPPS/SPDP'98 Workshops Held in Conjunction with the 12th International Parallel Processing Symposium and 9th Symposium on Parallel and Distributed Processing, Orlando, Florida, USA, March 30, 1998
Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998
Proceedings of the Seventh Heterogeneous Computing Workshop, 1998
Proceedings of the 3rd ACM International Conference on Digital Libraries, 1998
1997
ACM Trans. Math. Softw., 1997
The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers.
SIAM J. Sci. Comput., 1997
Parallel Comput., 1997
Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing.
J. Parallel Distributed Comput., 1997
Int. J. High Perform. Comput. Appl., 1997
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1997
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1997
PVMPI Provides Interoperability Between MPI Implementations.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
A Distributed Memory Implementation of the Nonsymmetric QR Algorithm.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
A Further Proposal for a Fortran 90 Interface for LAPACK.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
ScaLAPACK: A Linear Algebra Library for Message-Passing Computers.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
Proceedings of the High-Performance Computing and Networking, 1997
Proceedings of the 1997 International Conference on Application-Specific Systems, 1997
1996
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines.
Sci. Program., 1996
Concurr. Pract. Exp., 1996
Proceedings of the Vector and Parallel Processing, 1996
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996
Case studies on the development of ScaLAPACK and the NAG Numerical PVM Library.
Proceedings of the Quality of Numerical Software, 1996
Matrix Market: a web resource for test matrix collections.
Proceedings of the Quality of Numerical Software, 1996
Proceedings of the Parallel Virtual Machine, 1996
Proceedings of the Applied Parallel Computing, 1996
Proceedings of the High-Performance Computing and Networking, 1996
Proceedings of the High-Performance Computing and Networking, 1996
Proceedings of the Euro-Par '96 Parallel Processing, 1996
1995
SIAM Rev., 1995
Performance Study of LU Factorization with Low Communication Overhead on Multiprocessors.
Parallel Process. Lett., 1995
Parallel Comput., 1995
A Parallel Algorithm for the Reduction of a Nonsymmetric Matrix to Block Upper-Hessenberg Form.
Parallel Comput., 1995
The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form.
Numer. Algorithms, 1995
IEEE Parallel Distributed Technol. Syst. Appl., 1995
Proceedings of the ACM SIGSOFT Symposium on Software Reusability, 1995
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995
Position Paper.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995
Proceedings of the Applied Parallel Computing, 1995
Proceedings of the Applied Parallel Computing, 1995
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.
Proceedings of the Applied Parallel Computing, 1995
Scalable linear algebra software libraries for distributed memory concurrent computers.
Proceedings of the 5th IEEE Workshop on Future Trends of Distributed Computing Systems (FTDCS 1995), 1995
Proceedings of the Digest of Papers: FTCS-25, 1995
Management of the Nationale HPCC Software Exchange - A Virtual Distributed Digital Library.
Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries, 1995
Proceedings of the Digital Libraries, Research and Technology Advances, 1995
Proceedings of the Computer Science Today: Recent Trends and Developments, 1995
1994
Parallel Comput., 1994
J. Parallel Distributed Comput., 1994
Int. J. High Perform. Comput. Appl., 1994
Pumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers.
Concurr. Pract. Exp., 1994
Constructing Numerical Software Libraries for High-Performance Computing Environments.
Proceedings of the Parallel Scientific Computing, First International Workshop, 1994
The Design of Scalable Software Libraries for Distributed Memory Concurrent Computers.
Proceedings of the 8th International Symposium on Parallel Processing, 1994
Proceedings of the Third International Symposium on High Performance Distributed Computing, 1994
Other Titles in Applied Mathematics, SIAM, ISBN: 978-1-61197-153-8, 1994
1993
SIAM J. Sci. Comput., 1993
Proc. IEEE, 1993
IEEE Parallel Distributed Technol. Syst. Appl., 1993
LAPACK++: a design overview of object-oriented extensions for high performance linear algebra.
Proceedings of the Proceedings Supercomputing '93, 1993
Two Dimensional Basic Linear Algebra Communication Subprograms.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993
Using PVM 3.0 to Run Grand Challenge Applications on a Heterogeneous Network of Parallel Computers.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993
LAPACK for Distributed Memory Architectures: The Next Generation.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993
Tools for Heterogeneous Network Computing.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993
1992
Algorithm 710: FORTRAN subroutines for computing the eigenvalues and eigenvectors of a general matrix by reduction to general tridiagonal form.
ACM Trans. Math. Softw., 1992
SIGARCH Comput. Archit. News, 1992
SIAM J. Matrix Anal. Appl., 1992
Reduction to condensed form for the eigenvalue problem on distributed memory architectures.
Parallel Comput., 1992
1991
Parallel loops - a test suite for parallelizing compilers: description and example results.
Parallel Comput., 1991
Proceedings of the Proceedings Supercomputing '91, 1991
Solving Computational Grand Challenges Using a Network of Heterogeneous Supercomputers.
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, 1991
LAPACK for Distributed Memory Architectures: Progress Report.
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, 1991
Solving linear systems on vector and shared memory computers.
SIAM, ISBN: 978-0-89871-270-4, 1991
1990
Algorithm 679; a set of level 3 basic linear algebra subprograms: model implementation and test programs.
ACM Trans. Math. Softw., 1990
A Tool to Aid in the Design, Implementation, and Understanding of Matrix Algorithms for Parallel Processors.
J. Parallel Distributed Comput., 1990
Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990
1989
Advanced Computing Research Facility, Mathematics and Computer Science Division, Argonne National Laboratory.
Int. J. High Perform. Comput. Appl., 1989
Evaluating Block Algorithm Variants in LAPACK.
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing, 1989
Proceedings of the 13th Annual International Computer Software and Applications Conference, 1989
1988
ACM Trans. Math. Softw., 1988
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs.
ACM Trans. Math. Softw., 1988
ACM Trans. Math. Softw., 1988
Parallel Comput., 1988
Parallel Comput., 1988
Proceedings of the Proceedings Supercomputing '88, Orlando, FL, USA, November 12-17, 1988, 1988
1987
Performance of various computers using standard linear equations software in a Fortran environment.
Simul., 1987
Parallel Comput., 1987
SCHEDULE: An Environment for Developing Transportable Explicitly Parallel Codes in Fortran-Abstract.
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1987
A Proposal for a Set of Level 3 Basic Linear Algebra Subprograms.
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, 1987
1986
Parallel Comput., 1986
Proceedings of the 1986 Workshop on Applied Computing, 1986
1985
A fully parallel algorithm for the symmetric eigenvalue problem.
Proceedings of the Selected Papers from the Second Conference on Parallel Processing for Scientific Computing, 1985
Proceedings of the 7th IEEE Symposium on Computer Arithmetic, 1985
1984
ACM Trans. Math. Softw., 1984
Parallel Comput., 1984
Multiprocessing linear algebra algorithms on the CRAY X-MP-2: Experiences with small granularity.
J. Parallel Distributed Comput., 1984
1982
Algorithm 589: SICEDR: A FORTRAN Subroutine for Improving the Accuracy of Computed Matrix Eigenvalues.
ACM Trans. Math. Softw., 1982
1979
1977
Lecture Notes in Computer Science 51, Springer, ISBN: 0387082549, 1977
1976
Lecture Notes in Computer Science 6, Springer, ISBN: 0-387-07546-1, 1976