Stanimire Tomov
Orcid: 0000-0002-5937-7959Affiliations:
- University of Tennessee, Knoxville, TN, USA
According to our database1,
Stanimire Tomov
authored at least 185 papers
between 2004 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications.
Future Gener. Comput. Syst., 2024
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
2022
Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022
Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the Computational Science - ICCS 2022, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
2021
ACM Trans. Math. Softw., 2021
J. Open Source Softw., 2021
Int. J. High Perform. Comput. Appl., 2021
Int. J. High Perform. Comput. Appl., 2021
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems.
IEEE Access, 2021
Proceedings of the Parallel Computing Technologies, 2021
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021
Proceedings of the Workshop on Exascale MPI, 2021
2020
ACM Trans. Parallel Comput., 2020
Matrix multiplication on batches of small matrices in half and half-complex precisions.
J. Parallel Distributed Comput., 2020
Int. J. High Perform. Comput. Appl., 2020
Concurr. Comput. Pract. Exp., 2020
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020
Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the Computational Science - ICCS 2020, 2020
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs.
Proceedings of the Computational Science - ICCS 2020, 2020
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices.
Parallel Comput., 2019
Int. J. High Perform. Comput. Netw., 2019
Concurr. Comput. Pract. Exp., 2019
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019
Hands-On Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments.
Proceedings of the High Performance Computing, 2019
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing.
Proceedings of the High Performance Computing, 2019
Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019
Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019
2018
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations.
IEEE Trans. Parallel Distributed Syst., 2018
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs.
IEEE Trans. Parallel Distributed Syst., 2018
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale.
SIAM Rev., 2018
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs.
Parallel Comput., 2018
J. Comput. Sci., 2018
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures.
J. Comput. Sci., 2018
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers.
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques.
Proceedings of the Computational Science - ICCS 2018, 2018
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018
Proceedings of the 25th IEEE International Conference on High Performance Computing Workshops, 2018
2017
J. Comput. Sci., 2017
Int. J. High Perform. Comput. Appl., 2017
IEEE Embed. Syst. Lett., 2017
Concurr. Comput. Pract. Exp., 2017
Concurr. Comput. Pract. Exp., 2017
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017
Proceedings of the General Purpose GPUs, 2017
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs.
Proceedings of the International Conference on Supercomputing, 2017
Proceedings of the International Conference on Computational Science, 2017
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
Proceedings of the Handbook of Big Data Technologies, 2017
2016
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU.
ACM Trans. Math. Softw., 2016
Acta Numer., 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs.
Proceedings of the International Conference on Computational Science 2016, 2016
Proceedings of the International Conference on Computational Science 2016, 2016
LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016
Proceedings of the Euro-Par 2016: Parallel Processing, 2016
2015
Supercomput. Front. Innov., 2015
Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations.
Sci. Program., 2015
Sci. Program., 2015
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs.
SIAM J. Sci. Comput., 2015
Int. J. High Perform. Comput. Appl., 2015
On the Design, Development, and Analysis of Optimized Matrix-Vector Multiplication Routines for Coprocessors.
Proceedings of the High Performance Computing - 30th International Conference, 2015
A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations.
Proceedings of the High Performance Computing - 30th International Conference, 2015
Performance analysis and design of a hessenberg reduction using stabilized blocked elementary transformations for new architectures.
Proceedings of the Symposium on High Performance Computing, 2015
Proceedings of the Symposium on High Performance Computing, 2015
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Efficient implementation of quantum materials simulations on distributed CPU-GPU systems.
Proceedings of the International Conference for High Performance Computing, 2015
Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs.
Proceedings of the International Conference for High Performance Computing, 2015
Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators.
Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2015
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform.
Proceedings of the International Conference on Computational Science, 2015
MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015
2014
Supercomput. Front. Innov., 2014
A novel hybrid CPU-GPU generalized eigensolver for electronic structure calculations based on fine-grained memory aware tasks.
Int. J. High Perform. Comput. Appl., 2014
Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems.
Concurr. Comput. Pract. Exp., 2014
Mixed-Precision Orthogonalization Scheme and Adaptive Step Size for Improving the Stability and Performance of CA-GMRES on GPUs.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Self-adaptive Multiprecision Preconditioners on Multicore and Manycore Architectures.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster.
Proceedings of the International Conference for High Performance Computing, 2014
Performance and portability with OpenCL for throughput-oriented HPC workloads across accelerators, coprocessors, and multicore processors.
Proceedings of the 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2014
Proceedings of the International Workshop on OpenCL, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
A Step towards Energy Efficient Computing: Redesigning a Hydrodynamic Application on CPU-GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Dynamically Balanced Synchronization-Avoiding LU Factorization with Multicore and GPUs.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014
Proceedings of the Numerical Computations with GPUs, 2014
2013
ACM Trans. Math. Softw., 2013
J. Parallel Distributed Comput., 2013
J. Comput. Sci., 2013
Leading Edge Hybrid Multi-GPU Algorithms for Generalized Eigenproblems in Electronic Structure Calculations.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
Portable HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi.
Proceedings of the Parallel Processing and Applied Mathematics, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication.
Proceedings of the International Conference on Supercomputing, 2013
2012
IEEE Trans. Parallel Distributed Syst., 2012
SIAM J. Sci. Comput., 2012
Proceedings of the International Conference on Computational Science, 2012
A Class of Communication-avoiding Algorithms for Solving General Dense Linear Systems on CPU/GPU Parallel Machines.
Proceedings of the International Conference on Computational Science, 2012
Proceedings of the International Conference on Computational Science, 2012
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming.
Parallel Comput., 2012
Poster: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Abstract: A Novel Hybrid CPU-GPU Generalized Eigensolver for Electronic Structure Calculations Based on Fine Grained Memory Aware Tasks.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems.
Proceedings of the International Conference on Supercomputing, 2012
Proceedings of the Transition of HPC Towards Exascale Computing, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
Proceedings of the High-Performance Scientific Computing - Algorithms and Applications., 2012
2011
Proceedings of the Conference on High Performance Computing Networking, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
Proceedings of the 9th IEEE/ACS International Conference on Computer Systems and Applications, 2011
2010
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing.
Parallel Comput., 2010
Parallel Comput., 2010
Int. J. High Perform. Comput. Appl., 2010
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 39th International Conference on Parallel Processing, 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
2009
Comput. Phys. Commun., 2009
Proceedings of the 2009 ACM Symposium on Applied Computing (SAC), 2009
2008
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy.
ACM Trans. Math. Softw., 2008
State-of-the-art eigensolvers for electronic structure calculations of large scale nano-systems.
J. Comput. Phys., 2008
2007
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007
The use of bulk states to accelerate the band edge state calculation of a semiconductor quantum dot.
J. Comput. Phys., 2007
2006
Conjugate-gradient eigenvalue solvers in computing electronic properties of nanostructure architectures.
Int. J. Comput. Sci. Eng., 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006
Exploiting Mixed Precision Floating Point Hardware in Scientific Computations.
Proceedings of the High Performance Computing and Grids in Action, 2006
2005
Explicit and Averaging A Posteriori Error Estimates for Adaptive Finite Volume Methods.
SIAM J. Numer. Anal., 2005
Benchmarking and implementation of probability-based simulations on programmable graphics cards.
Comput. Graph., 2005
Comparison of Nonlinear Conjugate-Gradient Methods for Computing the Electronic Properties of Nanostructure Architectures.
Proceedings of the Computational Science, 2005
2004
CoRR, 2004
Application of interactive parallel visualization for commodity-based clusters using visualization APIs.
Comput. Graph., 2004
Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems (CBMS 2004), 2004