Greg Henry

According to our database¹, Greg Henry authored at least 35 papers between 1993 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

1995

2000

2005

2010

2015

2020

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Deconstructing HPL-MxP Benchmark: A Numerical Perspective.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2024: Parallel Processing, 2024

2023

Cascading GEMM: High Precision from Low Precision.

[BibT_eX]

[DOI]

Devangi N. Parikh

Robert A. van de Geijn

Greg M. Henry

CoRR, 2023

2022

A BF16 FMA is All You Need for DNN Training.

[BibT_eX]

[DOI]

IEEE Trans. Emerg. Top. Comput., 2022

FASE: A Fast, Accurate and Seamless Emulator for Custom Numerical Formats.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2022

Proposed Consistent Exception Handling for the BLAS and LAPACK.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

2021

Dynamically Adapting Floating-Point Precision to Accelerate Deep Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Conference on Machine Learning and Applications, 2021

2020

Harnessing Deep Learning via a Single Building Block.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

High-Performance Deep Learning via a Single Building Block.

[BibT_eX]

[DOI]

CoRR, 2019

Leveraging the bfloat16 Artificial Intelligence Datatype For Higher-Precision Computations.

[BibT_eX]

[DOI]

Greg Henry

Ping Tak Peter Tang

Alexander Heinecke

Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

2018

Anatomy of high-performance deep learning convolutions on SIMD architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

2017

Mozart : Efficient Composition of Library Functions for Heterogeneous Execution.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2017

2016

Implementing Strassen's Algorithm with BLIS.

[BibT_eX]

[DOI]

Jianyu Huang

Tyler M. Smith

Greg M. Henry

Robert A. van de Geijn

CoRR, 2016

Efficiency of High Order Spectral Element Methods on Petascale Architectures.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 31st International Conference, 2016

Strassen's algorithm reloaded.

[BibT_eX]

[DOI]

Jianyu Huang

Tyler M. Smith

Greg M. Henry

Robert A. van de Geijn

Proceedings of the International Conference for High Performance Computing, 2016

LIBXSMM: accelerating small matrix multiplications by runtime code generation.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

2013

Design and Implementation of the Linpack Benchmark for Single and Multi-node Systems Based on Intel® Xeon Phi Coprocessor.

[BibT_eX]

[DOI]

Alexander Heinecke

Karthikeyan Vaidyanathan

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2004

A Family of High-Performance Matrix Multiplication Algorithms.

[BibT_eX]

[DOI]

John A. Gunnels

Fred G. Gustavson

Greg Henry

Robert A. van de Geijn

Proceedings of the Applied Parallel Computing, 2004

Rapid Development of High-Performance Linear Algebra Libraries.

[BibT_eX]

[DOI]

Enrique S. Quintana-Ortí

Robert A. van de Geijn

Proceedings of the Applied Parallel Computing, 2004

2002

Design, implementation and testing of extended and mixed precision BLAS.

[BibT_eX]

[DOI]

ACM Trans. Math. Softw., 2002

Scientific computing on the Itanium® processor.

[BibT_eX]

[DOI]

Sci. Program., 2002

A Parallel Implementation of the Nonsymmetric QR Algorithm for Distributed Memory Architectures.

[BibT_eX]

[DOI]

Greg Henry

David S. Watkins

Jack J. Dongarra

SIAM J. Sci. Comput., 2002

2001

FLAME: Formal Linear Algebra Methods Environment.

[BibT_eX]

[DOI]

John A. Gunnels

Fred G. Gustavson

Greg Henry

Robert A. van de Geijn

ACM Trans. Math. Softw., 2001

Scientific computing on the Itanium processor.

[BibT_eX]

[DOI]

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

A Family of High-Performance Matrix Multiplication Algorithms.

[BibT_eX]

[DOI]

John A. Gunnels

Greg Henry

Robert A. van de Geijn

Proceedings of the Computational Science - ICCS 2001, 2001

2000

High Performance Reactive Fluid Flow Simulations Using Adaptive Mesh Refinement on Thousands of Processors.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing 2000, 2000

1998

Application of a High Performance Parallel Eigensolver to Electronic Structure Calculations.

[BibT_eX]

[DOI]

Mark P. Sears

Ken Stanley

Greg Henry

Proceedings of the ACM/IEEE Conference on Supercomputing, 1998

1997

High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS.

[BibT_eX]

[DOI]

Bruce Greer

Greg Henry

Proceedings of the ACM/IEEE Conference on Supercomputing, 1997

On a Distributed Design and Implementation for a Matrix Equation.

[BibT_eX]

Greg Henry

Avijit Purkayastha

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

A Distributed Memory Implementation of the Nonsymmetric QR Algorithm.

[BibT_eX]

Jack J. Dongarra

Greg Henry

David S. Watkins

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

ScaLAPACK: A Linear Algebra Library for Message-Passing Computers.

[BibT_eX]

Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997

1996

Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality.

[BibT_eX]

[DOI]

Greg Henry

Robert A. van de Geijn

SIAM J. Sci. Comput., 1996

ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers - Design Issues and Performance.

[BibT_eX]

[DOI]

Proceedings of the 1996 ACM/IEEE Conference on Supercomputing, 1996

1995

A Parallel Unsymmetric Inverse Iteration Solver.

[BibT_eX]

Greg Henry

Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

1994

Applications of boundary element methods on the Intel Paragon.

[BibT_eX]

[DOI]

Proceedings of the Proceedings Supercomputing '94, 1994

1993

Improving the Unsymmetric Parallel QR Algorithm on Vector Machines.

[BibT_eX]

Greg Henry

Proceedings of the Sixth SIAM Conference on Parallel Processing for Scientific Computing, 1993

Greg Henry

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...