Greg Henry

According to our database1, Greg Henry authored at least 34 papers between 1993 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Cascading GEMM: High Precision from Low Precision.
CoRR, 2023

2022
A BF16 FMA is All You Need for DNN Training.
IEEE Trans. Emerg. Top. Comput., 2022

FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical Formats.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021
Dynamically Adapting Floating-Point Precision to Accelerate Deep Neural Network Training.
Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

2020
Harnessing Deep Learning via a Single Building Block.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019
High-Performance Deep Learning via a Single Building Block.
CoRR, 2019

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations.
Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

2018
Anatomy of high-performance deep learning convolutions on SIMD architectures.
Proceedings of the International Conference for High Performance Computing, 2018

2017
Mozart : Efficient Composition of Library Functions for Heterogeneous Execution.
Proceedings of the Languages and Compilers for Parallel Computing, 2017

2016
Implementing Strassen's Algorithm with BLIS.
CoRR, 2016

Efficiency of High Order Spectral Element Methods on Petascale Architectures.
Proceedings of the High Performance Computing - 31st International Conference, 2016

Strassen's algorithm reloaded.
Proceedings of the International Conference for High Performance Computing, 2016

LIBXSMM: accelerating small matrix multiplications by runtime code generation.
Proceedings of the International Conference for High Performance Computing, 2016

2013
Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2004
A Family of High-Performance Matrix Multiplication Algorithms.
Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.
Proceedings of the Applied Parallel Computing, 2004

2002
Design, implementation and testing of extended and mixed precision BLAS.
ACM Trans. Math. Softw., 2002

Scientific computing on the Itanium® processor.
Sci. Program., 2002

A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures.
SIAM J. Sci. Comput., 2002

2001
FLAME: Formal Linear Algebra Methods Environment.
ACM Trans. Math. Softw., 2001

Scientific computing on the Itanium processor.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

A Family of High-Performance Matrix Multiplication Algorithms.
Proceedings of the Computational Science - ICCS 2001, 2001

2000
High Performance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors.
Proceedings of the Proceedings Supercomputing 2000, 2000

1998
Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

1997
High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

On a Distributed Design and Implementation for a Matrix Equation.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

A Distributed Memory Implementation of the Nonsymmetric QR Algorithm.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

ScaLAPACK: A Linear Algebra Library for Message-Passing Computers.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

1996
Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality.
SIAM J. Sci. Comput., 1996

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.
Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

1995
A Parallel Unsymmetric Inverse Iteration Solver.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

1994
Applications of boundary element methods on the Intel Paragon.
Proceedings of the Proceedings Supercomputing '94, 1994

1993
Improving the Unsymmetric Parallel QR Algorithm on Vector Machines.
Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993


  Loading...