Xiaoye S. Li

ACM Trans. Math. Softw., September, 2023

Newly Released Capabilities in the Distributed-Memory SuperLU Sparse Direct Solver.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., March, 2023

Recent Trends in Graph Decomposition (Dagstuhl Seminar 23331).

[BibT_eX]

[DOI]

Dagstuhl Reports, 2023

Open Problems in (Hyper)Graph Decomposition.

[BibT_eX]

[DOI]

CoRR, 2023

Construction of Hierarchically Semi-Separable matrix Representation using Adaptive Johnson-Lindenstrauss Sketching.

[BibT_eX]

[DOI]

CoRR, 2023

Brief Announcement: Communication Optimal Sparse LU Factorization for Planar Matrices.

[BibT_eX]

[DOI]

Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2023

Harnessing the Crowd for Autotuning High-Performance Computing Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022

gSoFa: Scalable Sparse Symbolic LU Factorization on GPUs.

[BibT_eX]

[DOI]

Anil Gaihre

Hang Liu

IEEE Trans. Parallel Distributed Syst., 2022

Resiliency in numerical algorithm design for extreme scale simulations.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2022

Hybrid Models for Mixed Variables in Bayesian Optimization.

[BibT_eX]

[DOI]

CoRR, 2022

Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers.

[BibT_eX]

[DOI]

Proceedings of the SC22: International Conference for High Performance Computing, 2022

GPTuneBand: Multi-task and Multi-fidelity Autotuning for Large-scale High Performance Computing Applications.

[BibT_eX]

[DOI]

Proceedings of the 2022 SIAM Conference on Parallel Processing for Scientific Computing, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021

Trust: Triangle Counting Reloaded on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Butterfly Factorization Via Randomized Matrix-Vector Multiplications.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2021

Sparse Approximate Multifrontal Factorization with Butterfly Compression for High-Frequency Wave Equations.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2021

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2021

Non-smooth Bayesian Optimization in Tuning Problems.

[BibT_eX]

[DOI]

CoRR, 2021

Dr. Top-k: delegate-centric Top-k on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

GPTune: multitask learning for autotuning exascale applications.

[BibT_eX]

[DOI]

Yang Liu

Wissam M. Sid-Lakhdar

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Enhancing Autotuning Capability with a History Database.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

A Message-Driven, Multi-GPU Parallel Sparse Triangular Solver.

[BibT_eX]

[DOI]

Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, 2021

2020

A Distributed-Memory Algorithm for Computing a Heavy-Weight Perfect Matching on Bipartite Graphs.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2020

A parallel hierarchical blocked adaptive cross approximation algorithm.

[BibT_eX]

[DOI]

Yang Liu

Wissam M. Sid-Lakhdar

Elizaveta Rebrova

Pieter Ghysels

Int. J. High Perform. Comput. Appl., 2020

A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2020

GSoFa: Scalable Sparse LU Symbolic Factorization on GPUs.

[BibT_eX]

[DOI]

Anil Gaihre

Hang Liu

CoRR, 2020

C-SAW: a framework for graph sampling and random walk on GPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Leveraging One-Sided Communication for Sparse Triangular Solvers.

[BibT_eX]

[DOI]

Proceedings of the 2020 SIAM Conference on Parallel Processing for Scientific Computing, 2020

Distributed macroscopic traffic simulation with Open Traffic Models.

[BibT_eX]

[DOI]

Gabriel Gomes

Juliette Ugirumurera

Proceedings of the 23rd IEEE International Conference on Intelligent Transportation Systems, 2020

Scalable and Memory-Efficient Kernel Ridge Regression.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

Robust and Accurate Stopping Criteria for Adaptive Randomized Sampling in Matrix-Free Hierarchically Semiseparable Construction.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2019

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems.

[BibT_eX]

[DOI]

Richard W. Vuduc

J. Parallel Distributed Comput., 2019

Multitask and Transfer Learning for Autotuning Exascale Applications.

[BibT_eX]

[DOI]

Wissam M. Sid-Lakhdar

Mohsen Mahmoudi Aznaveh

James Weldon Demmel

CoRR, 2019

A communication-avoiding 3D sparse triangular solver.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

H-INDEX: Hash-Indexing for Parallel Triangle Counting on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE High Performance Extreme Computing Conference, 2019

2018

Consensus Ensemble System for Traffic Flow Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Intell. Transp. Syst., 2018

Efficient Online Hyperparameter Optimization for Kernel Ridge Regression with Applications to Traffic Time Series Prediction.

[BibT_eX]

[DOI]

CoRR, 2018

Matrix-free construction of HSS representation using adaptive randomized sampling.

[BibT_eX]

[DOI]

CoRR, 2018

A unified software framework for solving traffic assignment problems.

[BibT_eX]

[DOI]

CoRR, 2018

A distributed-memory approximation algorithm for maximum weight perfect bipartite matching.

[BibT_eX]

[DOI]

CoRR, 2018

Highly scalable distributed-memory sparse triangular solution algorithms.

[BibT_eX]

[DOI]

Proceedings of the Eighth SIAM Workshop on Combinatorial Scientific Computing, 2018

Efficient Online Hyperparameter Learning for Traffic Flow Prediction.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

A Unified Software Framework to Enable Solution of Traffic Assignment Problems at Extreme Scale.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Intelligent Transportation Systems, 2018

A Communication-Avoiding 3D LU Factorization Algorithm for Sparse Matrices.

[BibT_eX]

[DOI]

Richard W. Vuduc

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

A Study of Clustering Techniques and Hierarchical Matrix Formats for Kernel Ridge Regression.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

2017

xSDK Foundations: Toward an Extreme-scale Scientific Software Development Kit.

[BibT_eX]

[DOI]

Supercomput. Front. Innov., 2017

A Robust Parallel Preconditioner for Indefinite Systems Using Hierarchical Matrices and Randomized Sampling.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

A Parallel Geometric Multifrontal Solver Using Hierarchically Semiseparable Structure.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2016

A Distributed-Memory Package for Dense Hierarchically Semi-Separable Matrix Computations Using Randomization.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2016

An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2016

An algebraic multifrontal preconditioner that exploits the low-rank property.

[BibT_eX]

[DOI]

Artem Napov

Numer. Linear Algebra Appl., 2016

Tuning the Coarse Space Construction in a Spectral AMG Solver.

[BibT_eX]

[DOI]

Panayot S. Vassilevski

Delyan Kalchev

Proceedings of the International Conference on Computational Science 2016, 2016

Achieving High Parallel Efficiency on Modern Processors for X-Ray Scattering Data Analysis.

[BibT_eX]

[DOI]

Abhinav Sarje

Nicholas Wright

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

2015

Extending Summation Precision for Network Reduction Operations.

[BibT_eX]

[DOI]

George Michelogiannakis

John Shalf

Int. J. Parallel Program., 2015

An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling.

[BibT_eX]

[DOI]

CoRR, 2015

Comparative Performance Analysis of Coarse Solvers for Algebraic Multigrid on Multicore and Manycore Architectures.

[BibT_eX]

[DOI]

Panayot S. Vassilevski

Proceedings of the Parallel Processing and Applied Mathematics, 2015

A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Resilient Matrix Multiplication of Hierarchical Semi-Separable Matrices.

[BibT_eX]

[DOI]

Brian Austin

Eric Roman

Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale, 2015

2014

Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods.

[BibT_eX]

[DOI]

Marc Baboulin

François-Henry Rouet

Proceedings of the High Performance Computing for Computational Science - VECPAR 2014 - 11th International Conference, Eugene, OR, USA, June 30, 2014

High-Performance Inverse Modeling with Reverse Monte Carlo Simulations.

[BibT_eX]

[DOI]

Abhinav Sarje

Alexander Hexemer

Proceedings of the 43rd International Conference on Parallel Processing, 2014

A Distributed CPU-GPU Sparse Direct Solver.

[BibT_eX]

[DOI]

Richard W. Vuduc

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

2013

Efficient Scalable Algorithms for Solving Dense Linear Systems with Hierarchically Semiseparable Structures.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2013

Tuning HipGISAXS on Multi and Many Core Supercomputers.

[BibT_eX]

[DOI]

Abhinav Sarje

Alexander Hexemer

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

On Partitioning and Reordering Problems in a Hierarchically Parallel Hybrid Linear Solver.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

2012

Massively parallel X-ray scattering simulations.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster Systems.

[BibT_eX]

[DOI]

Ichitaro Yamazaki

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011

SuperLU.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

A Supernodal Approach to Incomplete LU Factorization with Partial Pivoting.

[BibT_eX]

[DOI]

Meiyue Shao

ACM Trans. Math. Softw., 2011

2010

Direction-Preserving and Schur-Monotonic Semiseparable Approximations of Symmetric Positive Definite Matrices.

[BibT_eX]

[DOI]

Ming Gu

Panayot S. Vassilevski

SIAM J. Matrix Anal. Appl., 2010

Fast algorithms for hierarchically semiseparable matrices.

[BibT_eX]

[DOI]

Jianlin Xia

Shivkumar Chandrasekaran

Ming Gu

Numer. Linear Algebra Appl., 2010

Particle-field decomposition and domain decomposition in parallel particle-in-cell beam dynamics simulation.

[BibT_eX]

[DOI]

Ji Qiang

Comput. Phys. Commun., 2010

On Techniques to Improve Robustness and Scalability of a Parallel Hybrid Linear Solver.

[BibT_eX]

[DOI]

Ichitaro Yamazaki

Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

2009

Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2009

Superfast Multifrontal Method for Large Structured Linear Systems of Equations.

[BibT_eX]

[DOI]

Jianlin Xia

Shivkumar Chandrasekaran

Ming Gu

SIAM J. Matrix Anal. Appl., 2009

Performance Modeling Tools for Parallel Sparse Linear Algebra Computations.

[BibT_eX]

[DOI]

Pietro Cicotti

Scott B. Baden

Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

2008

An Implementation and Evaluation of the AMLS Method for Sparse Eigenvalue Problems.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2008

Evaluation of Sparse LU Factorization and Triangular Solution on Multicore Platforms.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing for Computational Science, 2008

2007

Prospectus for a Dense Linear Algebra Software Library.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

Parallel Symbolic Factorization for Sparse LU with Static Pivoting.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2007

Towards an accurate performance modeling of parallel sparse factorization.

[BibT_eX]

[DOI]

Appl. Algebra Eng. Commun. Comput., 2007

2006

Error bounds from extra-precise iterative refinement.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2006

Unsymmetric Ordering Using A Constrained Markowitz Scheme.

[BibT_eX]

[DOI]

Stéphane Pralet

SIAM J. Matrix Anal. Appl., 2006

Diagonal Markowitz Scheme with Local Symmetrization.

[BibT_eX]

[DOI]

Esmond G. Ng

SIAM J. Matrix Anal. Appl., 2006

Prospectus for the Next LAPACK and ScaLAPACK Libraries.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005

An overview of SuperLU: Algorithms, implementation, and user interface.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2005

An Algebraic Substructuring Method for Large-Scale Eigenvalue Calculation.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2005

A Comparison of Three High-Precision Quadrature Schemes.

[BibT_eX]

[DOI]

Karthik Jeyabalan

Exp. Math., 2005

2004

Algebraic Sub-structuring for Electromagnetic Applications.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing, 2004

Performance Analysis of Parallel Right-Looking Sparse LU Factorization on Two Dimensional Grids of Processors.

[BibT_eX]

[DOI]

Proceedings of the Applied Parallel Computing, 2004

2003

SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2003

Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers.

[BibT_eX]

[DOI]

Iain S. Duff

Jean-Yves L'Excellent

Parallel Comput., 2003

2002

Design, implementation and testing of extended and mixed precision BLAS.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2002

Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations.

[BibT_eX]

[DOI]

SIAM Rev., 2002

A new scheduling algorithm for parallel sparse LU factorization with static pivoting.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

High performance computing meets experimental mathematics.

[BibT_eX]

[DOI]

David John Broadhurst

Yozo Hida

Brandon Thompson

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001

Analysis and comparison of two general sparse solvers for distributed memory computers.

[BibT_eX]

[DOI]

Iain S. Duff

Jean-Yves L'Excellent

ACM Trans. Math. Softw., 2001

Solution of a three-body problem in quantum mechanics using sparse linear algebra on parallel computers.

[BibT_eX]

[DOI]

Mark Baertschy

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Performance and tuning of two distributed memory sparse solvers.

[BibT_eX]

Iain S. Duff

Jean-Yves L'Excellent

Proceedings of the Tenth SIAM Conference on Parallel Processing for Scientific Computing, 2001

Algorithms for Quad-Double Precision Floating Point Arithmetic.

[BibT_eX]

[DOI]

Yozo Hida

Proceedings of the 15th IEEE Symposium on Computer Arithmetic (Arith-15 2001), 2001

2000

Collection offers overview of research on structured matrices [Book Review].

[BibT_eX]

[DOI]

Chiara Puglisi

IEEE Concurr., 2000

Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing, 2000

Common Issues.

[BibT_eX]

[DOI]

Henk A. van der Vorst

Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000

1999

An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination.

[BibT_eX]

[DOI]

James Weldon Demmel

John R. Gilbert

SIAM J. Matrix Anal. Appl., 1999

A Supernodal Approach to Sparse Partial Pivoting.

[BibT_eX]

[DOI]

SIAM J. Matrix Anal. Appl., 1999

A Scalable Sparse Direct Solver Using Static Pivoting.

[BibT_eX]

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

1998

Making Sparse Gaussian Elimination Scalable by Static Pivoting.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

1994

Faster Numerical Algorithms via Exception Handling.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1994

1993

Decentralized optimal power pricing: the development of a parallel program.

[BibT_eX]

[DOI]

IEEE Parallel Distributed Technol. Syst. Appl., 1993

1991

On a Massively Parallel e-Relaxization Algorithm for Linear Transformation Problems.

[BibT_eX]