James Demmel

Orcid: 0000-0002-0550-5476

Affiliations:
  • University of California, Berkeley, USA


According to our database1, James Demmel authored at least 246 papers between 1985 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2002, "For contributions to the field of computational mathematics and the development of mathematical software.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Distributed and Joint Evidential K-Nearest Neighbor Classification.
IEEE Trans. Knowl. Data Eng., November, 2024

Scalable Evidential K-Nearest Neighbor Classification on Big Data.
IEEE Trans. Big Data, June, 2024

On Multilinear Inequalities of Holder-Brascamp-Lieb Type for Torsion-Free Discrete Abelian Groups.
J. Log. Anal., 2024

Non-smooth Bayesian optimization in tuning scientific applications.
Int. J. High Perform. Comput. Appl., 2024

WallFacer: Guiding Transformer Model Training Out of the Long-Context Dark Forest with N-body Problem.
CoRR, 2024

LPSim: Large Scale Multi-GPU Parallel Computing based Regional Scale Traffic Simulation Framework.
CoRR, 2024

Distributed-Memory Randomized Algorithms for Sparse Tensor CP Decomposition.
Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024

Fast multiplication of random dense matrices with sparse matrices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
An Improved Analysis and Unified Perspective on Deterministic and Randomized Low-Rank Matrix Approximation.
SIAM J. Matrix Anal. Appl., June, 2023

Nearly Optimal Block-Jacobi Preconditioning.
SIAM J. Matrix Anal. Appl., March, 2023

CholeskyQR with Randomization and Pivoting for Tall Matrices (CQRRPT).
CoRR, 2023

Fast multiplication of random dense matrices with fixed sparse matrices.
CoRR, 2023

Surrogate-based Autotuning for Randomized Sketching Algorithms in Regression Problems.
CoRR, 2023

Computron: Serving Distributed Deep Learning Models with Model Parallel Swapping.
CoRR, 2023

Generalized Pseudospectral Shattering and Inverse-Free Matrix Pencil Diagonalization.
CoRR, 2023

Randomized Numerical Linear Algebra : A Perspective on the Field With an Eye to Software.
CoRR, 2023

Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Harnessing the Crowd for Autotuning High-Performance Computing Applications.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
Hybrid Models for Mixed Variables in Bayesian Optimization.
CoRR, 2022

Communication bounds for convolutional neural networks.
Proceedings of the PASC '22: Platform for Advanced Scientific Computing Conference, Basel, Switzerland, June 27, 2022

Distributed-Memory Sparse Kernels for Machine Learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021
Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions.
SIAM J. Sci. Comput., 2021

Parallel and Communication Avoiding Least Angle Regression.
SIAM J. Sci. Comput., 2021

Non-smooth Bayesian Optimization in Tuning Problems.
CoRR, 2021

Fast Bilinear Algorithms for Symmetric Tensor Contractions.
Comput. Methods Appl. Math., 2021

Communication-avoiding kernel ridge regression on parallel and distributed systems.
CCF Trans. High Perform. Comput., 2021

Auto-Precision Scaling for Distributed Deep Learning.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Dynamic scaling for low-precision learning.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

GPTune: multitask learning for autotuning exascale applications.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Enhancing Autotuning Capability with a History Database.
Proceedings of the 14th IEEE International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2021

CoSA: Scheduling by Constrained Optimization for Spatial Accelerators.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Bidiagonal SVD Computation via an Associated Tridiagonal Eigenproblem.
ACM Trans. Math. Softw., 2020

Algorithms for Efficient Reproducible Floating Point Summation.
ACM Trans. Math. Softw., 2020

Fast LSTM by dynamic decomposition on cloud and distributed systems.
Knowl. Inf. Syst., 2020

The Limit of the Batch Size.
CoRR, 2020

Communication-Optimal Tilings for Projective Nested Loops with Arbitrary Bounds.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.
Proceedings of the 8th International Conference on Learning Representations, 2020

Rethinking the Value of Asynchronous Solvers for Distributed Deep Learning.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2020

Avoiding Communication in Logistic Regression.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

2019
Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs.
IEEE Trans. Parallel Distributed Syst., 2019

Avoiding Communication in Primal and Dual Block Coordinate Descent Methods.
SIAM J. Sci. Comput., 2019

An improved analysis and unified perspective on deterministic and randomized low rank matrix approximations.
CoRR, 2019

A Generalized Randomized Rank-Revealing Factorization.
CoRR, 2019

Multitask and Transfer Learning for Autotuning Exascale Applications.
CoRR, 2019

Parallel and Communication Avoiding Least Angle Regression.
CoRR, 2019

Reducing BERT Pre-Training Time from 3 Days to 76 Minutes.
CoRR, 2019

Large-batch training for LSTM and beyond.
Proceedings of the International Conference for High Performance Computing, 2019

Fast LSTM Inference by Dynamic Decomposition on Cloud Systems.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

2018
Low Rank Approximation of a Sparse Matrix Based on LU Factorization with Column and Row Tournament Pivoting.
SIAM J. Sci. Comput., 2018

Communication-Optimal Convolutional Neural Nets.
CoRR, 2018

A 3D Parallel Algorithm for QR Decomposition.
Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, 2018

Avoiding Synchronization in First-Order Methods for Sparse Convex Optimization.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Accurate, Fast and Scalable Kernel Ridge Regression on Parallel and Distributed Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

ImageNet Training in Minutes.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Reducing Communication in Proximal Newton Methods for Sparse Least Squares Problems.
Proceedings of the 47th International Conference on Parallel Processing, 2018

Augmented Arithmetic Operations Proposed for IEEE-754 2018.
Proceedings of the 25th IEEE Symposium on Computer Arithmetic, 2018

2017
Design and Implementation of a Communication-Optimal Classifier for Distributed Kernel Support Vector Machines.
IEEE Trans. Parallel Distributed Syst., 2017

Avoiding Communication in Proximal Methods for Convex Optimization Problems.
CoRR, 2017

100-epoch ImageNet Training with AlexNet in 24 Minutes.
CoRR, 2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.
Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Scaling deep learning on GPU and knights landing clusters.
Proceedings of the International Conference for High Performance Computing, 2017

Runtime Data Layout Scheduling for Machine Learning Dataset.
Proceedings of the 46th International Conference on Parallel Processing, 2017

2016
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication.
SIAM J. Sci. Comput., 2016

Implementing a Collaborative Online Course to Extend Access to HPC Skills.
Comput. Sci. Eng., 2016

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies.
CoRR, 2016

Parallelepipeds obtaining HBL lower bounds.
CoRR, 2016

Network Topologies and Inevitable Contention.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Asynchronous Parallel Greedy Coordinate Descent.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Write-Avoiding Algorithms.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Floating-point precision tuning using blame analysis.
Proceedings of the 38th International Conference on Software Engineering, 2016

Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

2015
Avoiding Communication in Successive Band Reduction.
ACM Trans. Parallel Comput., 2015

Parallel Reproducible Summation.
IEEE Trans. Computers, 2015

Communication Avoiding Rank Revealing QR Factorization with Column Pivoting.
SIAM J. Matrix Anal. Appl., 2015

Accuracy of the s-Step Lanczos Method for the Symmetric Eigenproblem in Finite Precision.
SIAM J. Matrix Anal. Appl., 2015

Reconstructing Householder vectors from Tall-Skinny QR.
J. Parallel Distributed Comput., 2015

Extending access to HPC skills through a blended online course.
Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced Cyberinfrastructure, St. Louis, MO, USA, July 26, 2015

CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Reproducible Tall-Skinny QR.
Proceedings of the 22nd IEEE Symposium on Computer Arithmetic, 2015

2014
A Residual Replacement Strategy for Improving the Maximum Attainable Accuracy of s-Step Krylov Subspace Methods.
SIAM J. Matrix Anal. Appl., 2014

Communication-Avoiding Symmetric-Indefinite Factorization.
SIAM J. Matrix Anal. Appl., 2014

A massively parallel tensor contraction framework for coupled-cluster computations.
J. Parallel Distributed Comput., 2014

Communication costs of Strassen's matrix multiplication.
Commun. ACM, 2014

Communication lower bounds and optimal algorithms for numerical linear algebra.
Acta Numer., 2014

Architecting an autograder for parallel code.
Proceedings of the Annual Conference of the Extreme Science and Engineering Discovery Environment, 2014

Tradeoffs between synchronization, communication, and computation in parallel linear algebra computations.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Reconstructing Householder Vectors from Tall-Skinny QR.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Author retrospective for optimizing matrix multiply using PHiPAC: a portable high-performance ANSI C coding methodology.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

2013
Avoiding Communication in Nonsymmetric Lanczos-Based Krylov Subspace Methods.
SIAM J. Sci. Comput., 2013

LU Factorization with Panel Rank Revealing Pivoting and Its Communication Avoiding Version.
SIAM J. Matrix Anal. Appl., 2013

Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1.
CoRR, 2013

Providing a supported online course on parallel computing.
Proceedings of the Extreme Science and Engineering Discovery Environment: Gateway to Discovery, 2013

Communication efficient gaussian elimination with partial pivoting using a shape morphing data layout.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Communication optimal parallel multiplication of sparse random matrices.
Proceedings of the 25th ACM Symposium on Parallelism in Algorithms and Architectures, 2013

Precimonious: tuning assistant for floating-point precision.
Proceedings of the International Conference for High Performance Computing, 2013

Exploiting Data Sparsity in Parallel Matrix Powers Computations.
Proceedings of the Parallel Processing and Applied Mathematics, 2013

Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Minimizing Communication in All-Pairs Shortest Paths.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Perfect Strong Scaling Using No Additional Energy.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Communication-Avoiding Algorithms for Linear Algebra and Beyond.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing a Blocked Aasen's Algorithm with a Dynamic Scheduler on Multicore Architectures.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Direct QR factorizations for tall-and-skinny matrices in MapReduce architectures.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013

Numerical Reproducibility and Accuracy at ExaScale.
Proceedings of the 21st IEEE Symposium on Computer Arithmetic, 2013

Fast Reproducible Floating-Point Summation.
Proceedings of the 21st IEEE Symposium on Computer Arithmetic, 2013

2012
Fast ℓ<sub>1</sub>-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime.
IEEE Trans. Medical Imaging, 2012

Communication-optimal Parallel and Sequential QR and LU Factorizations.
SIAM J. Sci. Comput., 2012

Graph expansion and communication costs of fast matrix multiplication.
J. ACM, 2012

Strong Scaling of Matrix Multiplication Algorithms and Memory-Independent Communication Lower Bounds
CoRR, 2012

Matrix Multiplication on Multidimensional Torus Networks.
Proceedings of the High Performance Computing for Computational Science, 2012

Communication-optimal parallel algorithm for strassen's matrix multiplication.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Communication-avoiding parallel strassen: implementation and performance.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Communication avoiding algorithms.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Communication avoiding successive band reduction.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Graph Expansion Analysis for Communication Costs of Fast Rectangular Matrix Multiplication.
Proceedings of the Design and Analysis of Algorithms, 2012

2011
SuperLU.
Proceedings of the Encyclopedia of Parallel Computing, 2011

CALU: A Communication Optimal LU Factorization Algorithm.
SIAM J. Matrix Anal. Appl., 2011

Minimizing Communication in Numerical Linear Algebra.
SIAM J. Matrix Anal. Appl., 2011

Graph expansion and communication costs of fast matrix multiplication: regular submission.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Brief announcement: communication bounds for heterogeneous architectures.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

Accurate and efficient expression evaluation and linear algebra, or why it can be easier to compute accurate eigenvalues of a Vandermonde matrix than the accurate sum of 3 numbers.
Proceedings of the SNC 2011, 2011

Improving communication performance in dense linear algebra via topology aware collectives.
Proceedings of the Conference on High Performance Computing Networking, 2011

Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Communication-Avoiding QR Decomposition for GPUs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

On improving trust-region variable projection algorithms for separable nonlinear least squares learning.
Proceedings of the 2011 International Joint Conference on Neural Networks, 2011

Rethinking algorithms for future architectures: Communication-avoiding algorithms.
Proceedings of the 2011 IEEE Hot Chips 23 Symposium (HCS), 2011

Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Avoiding Communication in Numerical Linear Algebra.
Proceedings of the Thirteenth Workshop on Algorithm Engineering and Experiments, 2011

2010
Communication-optimal Parallel and Sequential Cholesky Decomposition.
SIAM J. Sci. Comput., 2010

Minimizing Communication for Eigenproblems and the Singular Value Decomposition
CoRR, 2010

Brief announcement: Lower bounds on communication for sparse Cholesky factorization of a model problem.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

2009
Extra-Precise Iterative Refinement for Overdetermined Least Squares Problems.
ACM Trans. Math. Softw., 2009

Nonnegative Diagonals and High Performance on Low-Profile Matrices from Householder QR.
SIAM J. Sci. Comput., 2009

Optimization of sparse matrix-vector multiplication on emerging multicore platforms.
Parallel Comput., 2009

Minimizing Communication in Linear Algebra
CoRR, 2009

A view of the parallel computing landscape.
Commun. ACM, 2009

Communication-optimal parallel and sequential Cholesky decomposition: extended abstract.
Proceedings of the SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2009

Minimizing communication in sparse matrix solvers.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

2008
Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers.
ACM Trans. Math. Softw., 2008

Cache efficient bidiagonalization using BLAS 2.5 operators.
ACM Trans. Math. Softw., 2008

Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers.
SIAM J. Sci. Comput., 2008

Continuation of Invariant Subspaces in Large Bifurcation Problems.
SIAM J. Sci. Comput., 2008

Sparse SOS Relaxations for Minimizing Functions that are Summations of Small Polynomials.
SIAM J. Optim., 2008

Global minimization of rational functions and the nearest GCDs.
J. Glob. Optim., 2008

Communication-avoiding parallel and sequential QR factorizations
CoRR, 2008

Benchmarking GPUs to tune dense linear algebra.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Communication avoiding Gaussian elimination.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Avoiding communication in sparse matrix computations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Prospectus for a Dense Linear Algebra Software Library.
Proceedings of the Handbook of Parallel Computing - Models, Algorithms and Applications., 2007

Parallel Symbolic Factorization for Sparse LU with Static Pivoting.
SIAM J. Sci. Comput., 2007

Fast matrix multiplication is stable.
Numerische Mathematik, 2007

Fast linear algebra is stable.
Numerische Mathematik, 2007

Accurate and Efficient Expression Evaluation and Linear Algebra
CoRR, 2007

When cache blocking of sparse matrix vector multiply works and why.
Appl. Algebra Eng. Commun. Comput., 2007

Health monitoring of civil infrastructures using wireless sensor networks.
Proceedings of the 6th International Conference on Information Processing in Sensor Networks, 2007

2006
Error bounds from extra-precise iterative refinement.
ACM Trans. Math. Softw., 2006

Minimizing Polynomials via Sum of Squares over the Gradient Ideal.
Math. Program., 2006

Accurate and efficient evaluation of Schur and Jack functions.
Math. Comput., 2006

Wireless sensor networks for structural health monitoring.
Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, 2006

Automatic Performance Tuning for the Multi-section with Multiple Eigenvalues Method for Symmetric Tridiagonal Eigenproblems.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Prospectus for the Next LAPACK and ScaLAPACK Libraries.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

2005
The Accurate and Efficient Solution of a Totally Positive Generalized Vandermonde Linear System.
SIAM J. Matrix Anal. Appl., 2005

Minimum Ellipsoid Bounds for Solutions of Polynomial Systems via Sum of Squares.
J. Glob. Optim., 2005

Toward accurate polynomial evaluation in rounded arithmetic
CoRR, 2005

Bifurcation Analysis of Large Equilibrium Systems in Matlab.
Proceedings of the Computational Science, 2005

Toward accurate polynomial evaluation in rounded arithmetic (short report).
Proceedings of the Algebraic and Numerical Algorithms and Computer-assisted Proofs, 2005

2004
Accurate and Efficient Floating Point Summation.
SIAM J. Sci. Comput., 2004

Accurate SVDs of weakly diagonally dominant M-matrices.
Numerische Mathematik, 2004

Fast and Accurate Floating Point Summation with Application to Computational Geometry.
Numer. Algorithms, 2004

Statistical Models for Empirical Search-Based Performance Tuning.
Int. J. High Perform. Comput. Appl., 2004

Performance Tuning of Matrix Triple Products Based on Matrix Structure.
Proceedings of the Applied Parallel Computing, 2004

Model Reduction for RF MEMS Simulation.
Proceedings of the Applied Parallel Computing, 2004

Performance Models for Evaluation and Automatic Tuning of Symmetric Sparse Matrix-Vector Multiply.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems.
ACM Trans. Math. Softw., 2003

On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning.
Neural Networks, 2003

Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply.
Proceedings of the Advances in Neural Information Processing Systems 16 [Neural Information Processing Systems, 2003

Memory Hierarchy Optimizations and Performance ounds for Sparse A.
Proceedings of the Computational Science - ICCS 2003, 2003

2002
Design, implementation and testing of extended and mixed precision BLAS.
ACM Trans. Math. Softw., 2002

On computing givens rotations reliably and efficiently.
ACM Trans. Math. Softw., 2002

Performance optimizations and bounds for sparse matrix-vector multiply.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

2001
On the Complexity of Computing Error Bounds.
Found. Comput. Math., 2001

Statistical Models for Automatic Performance Tuning.
Proceedings of the Computational Science - ICCS 2001, 2001

A Data Broker for Distributed Computing Environments.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
Computing Connecting Orbits via an Improved Algorithm for Continuing Invariant Subspaces.
SIAM J. Sci. Comput., 2000

Accurate Singular Value Decompositions of Structured Matrices.
SIAM J. Matrix Anal. Appl., 2000

Code Generators for Automatic Tuning of Numerical Kernels: Experiences with FFTW.
Proceedings of the Semantics, 2000

On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems.
Proceedings of the Advances in Neural Information Processing Systems 13, 2000

Common Issues.
Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000

Singular Value Decomposition.
Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000

A Brief Tour of Eigenproblems.
Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000

Non-Hermitian Eigenvalue Problems.
Proceedings of the Templates for the Solution of Algebraic Eigenvalue Problems, 2000

1999
An Asynchronous Parallel Supernodal Algorithm for Sparse Gaussian Elimination.
SIAM J. Matrix Anal. Appl., 1999

A Supernodal Approach to Sparse Partial Pivoting.
SIAM J. Matrix Anal. Appl., 1999

Making Sparse Matrix Computations Scalable (Invited Talk Abstract).
Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures, 1999

Parallel Multigrid Solver for 3D Unstructured Finite Element Problems.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

A Scalable Sparse Direct Solver Using Static Pivoting.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

LAPACK Users' Guide, Third Edition
Software, Environments and Tools, SIAM, ISBN: 978-0-89871-960-4, 1999

1998
Using the Matrix Sign Function to Compute Invariant Subspaces.
SIAM J. Matrix Anal. Appl., January, 1998

Programming Tools and Environments.
Commun. ACM, 1998

Making Sparse Gaussian Elimination Scalable by Static Pivoting.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

1997
Practical Experience in the Numerical Dangers of Heterogeneous Computing.
ACM Trans. Math. Softw., 1997

The Spectral Decomposition of Nonsymmetric Matrices on Distributed Memory Parallel Computers.
SIAM J. Sci. Comput., 1997

Models and Scheduling Algorithms for Mixed Data and Task Parallel Programs.
J. Parallel Distributed Comput., 1997

ScaLAPACK: A Linear Algebra Library for Message-Passing Computers.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

Optimizing Matrix Multiply Using PHiPAC: A Portable, High-Performance, ANSI C Coding Methodology.
Proceedings of the 11th international conference on Supercomputing, 1997

Using PHiPAC to speed error back-propagation learning.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

Applied Numerical Linear Algebra.
SIAM, ISBN: 978-0-898713-89-3, 1997

1996
ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

Practical Experience in the Dangers of Heterogeneous Computing.
Proceedings of the Applied Parallel Computing, 1996

1995
Stability of block <i>LU</i> factorization.
Numer. Linear Algebra Appl., 1995

Algorithms for Intersecting Parametric and Algebraic Curves II: Multiple Intersections.
CVGIP Graph. Model. Image Process., 1995

Modeling the Benefits of Mixed Data and Task Parallelism.
Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995

Performance of a Parallel Global Atmospheric Chemical Tracer Model.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

The Performance of Finding Eigenvalues and Eigenvaectors of Dense Symmetric Matrices on Distributed Memory Computers.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.
Proceedings of the Applied Parallel Computing, 1995

Templates for Linear Algebra Problems.
Proceedings of the Computer Science Today: Recent Trends and Developments, 1995

1994
Algorithms for intersecting parametric and algebraic curves I: simple intersections.
ACM Trans. Graph., 1994

Faster Numerical Algorithms via Exception Handling.
IEEE Trans. Computers, 1994

Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
Other Titles in Applied Mathematics, SIAM, ISBN: 978-1-61197-153-8, 1994

1993
Improved Error Bounds for Underdetermined System Solvers.
SIAM J. Matrix Anal. Appl., January, 1993

The generalized Schur decomposition of an arbitrary pencil A-λB - robust software with error bounds and applications. Part II: software and applications.
ACM Trans. Math. Softw., 1993

The generalized Schur decomposition of an arbitrary pencil A-λB - robust software with error bounds and applications. Part I: theory and algorithms.
ACM Trans. Math. Softw., 1993

On computing condition numbers for the nonsymmetric eigenproblem.
ACM Trans. Math. Softw., 1993

Computing the Generalized Singular Value Decomposition.
SIAM J. Sci. Comput., 1993

A New Algorithm for the Symmetric Tridiagonal Eigenvalue Problem.
J. Complex., 1993

LAPACK for Distributed Memory Architectures: The Next Generation.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993

Design of a Parallel Nonsymmetric Eigenroutine Toolbox, Part I.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993

1992
Stability of block algorithms with fast level-3 BLAS.
ACM Trans. Math. Softw., 1992

Jacobi's Method is More Accurate than QR.
SIAM J. Matrix Anal. Appl., 1992

The Componentwise Distance to the Nearest Singular Matrix.
SIAM J. Matrix Anal. Appl., 1992

1991
LAPACK: A portable linear algebra library for high-performance computers.
Concurr. Pract. Exp., 1991

1990
Accurate Singular Values of Bidiagonal Matrices.
SIAM J. Sci. Comput., 1990

Matrix Computations; Second Edition (Gene Golub and Charles F. Van Loan).
SIAM Rev., 1990

LAPACK: a portable linear algebra library for high-performance computers.
Proceedings of the Proceedings Supercomputing '90, New York, NY, USA, November 12-16, 1990, 1990

1989
On a Block Implementation of Hessenberg Multishift QR Iteration.
Int. J. High Speed Comput., 1989

Optimal three finger grasps.
Proceedings of the 1989 IEEE International Conference on Robotics and Automation, 1989

1988
Theoretical and experimental studies using a multifinger planar manipulator.
Proceedings of the 1988 IEEE International Conference on Robotics and Automation, 1988

1987
The geometry of III-conditioning.
J. Complex., 1987

Three methods for refining estimates of invariant subspaces.
Computing, 1987

On error analysis in arithmetic with varying relative precision.
Proceedings of the 8th IEEE Symposium on Computer Arithmetic, 1987

1985
An interval algorithm for solving systems of linear equations to prespecified accuracy.
Computing, 1985


  Loading...