Gheorghe-Teodor Bercea

CoRR, 2024

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®.

[BibT_eX]

[DOI]

Suyash Tandon

Leopold Grinberg

Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

2023

Reliable Actors with Retry Orchestration.

[BibT_eX]

[DOI]

Olivier Tardieu

David Grove

Paul Castro

Jaroslaw Cwiklik

Edward A. Epstein

Proc. ACM Program. Lang., 2023

Specialized Kernels for Optimizing GPU Offload in OpenMP.

[BibT_eX]

[DOI]

Dhruva R. Chakrabarti

Gregory Rodgers

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022

The Good, the Bad, and the Outliers: A Testing Framework for Decision Optimization Model Learning.

[BibT_eX]

[DOI]

Orit Davidovich

Segev Wasserkrug

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

2020

Hybrid CPU/GPU tasks optimized for concurrency in OpenMP.

[BibT_eX]

[DOI]

Alexey Bataev

Leopold Grinberg

John K. O'Brien

IBM J. Res. Dev., 2020

An open-source solution to performance portability for Summit and Sierra supercomputers.

[BibT_eX]

[DOI]

Alexey Bataev

John K. O'Brien

IBM J. Res. Dev., 2020

Compiling ONNX Neural Network Models Using MLIR.

[BibT_eX]

[DOI]

Tung D. Le

Tong Chen

CoRR, 2020

2019

Sublinear Subwindow Search.

[BibT_eX]

[DOI]

Max Reuter

CoRR, 2019

2017

Improving high performance computing using code generation and compilation techniques.

[BibT_eX]

[DOI]

PhD thesis, 2017

Firedrake: Automating the Finite Element Method by Composing Abstractions.

[BibT_eX]

[DOI]

Graham R. Markall

ACM Trans. Math. Softw., 2017

Implementing implicit OpenMP data sharing on GPUs.

[BibT_eX]

[DOI]

Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017

Efficient Fork-Join on GPUs Through Warp Specialization.

[BibT_eX]

[DOI]

Arpith Chacko Jacob

Hyojin Sung

Samuel F. Antão

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016

Automated Generation and Symbolic Manipulation of Tensor Product Finite Elements.

[BibT_eX]

[DOI]

Andrew T. T. McRae

Lawrence Mitchell

David A. Ham

Colin J. Cotter

SIAM J. Sci. Comput., 2016

A numbering algorithm for finite element on extruded meshes which avoids the unstructured mesh penalty.

[BibT_eX]

[DOI]

CoRR, 2016

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support.

[BibT_eX]

[DOI]

Proceedings of the 7th International Workshop on Performance Modeling, 2016

Offloading Support for OpenMP in Clang and LLVM.

[BibT_eX]

[DOI]

Samuel F. Antão

Alexey Bataev

Proceedings of the Third Workshop on the LLVM Compiler Infrastructure in HPC, 2016

Early Experiences Porting Three Applications to OpenMP 4.5.

[BibT_eX]

[DOI]

Bronis R. de Supinski

Erik W. Draeger

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

2015

Integrating GPU support for OpenMP offloading directives into Clang.

[BibT_eX]

[DOI]

Samuel Antão

Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

Performance analysis of OpenMP on a GPU using a CORAL proxy application.

[BibT_eX]

[DOI]

Samuel F. Antão

Proceedings of the 6th International Workshop on Performance Modeling, 2015

2014

Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Ana Lucia Varbanescu

Florian Rathgeber

J. Ramanujam

David A. Ham

ACM Trans. Archit. Code Optim., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Ana Lucia Varbanescu

Florian Rathgeber

J. Ramanujam

David A. Ham

CoRR, 2014

Generalizing Run-Time Tiling with the Loop Chain Abstraction.

[BibT_eX]

[DOI]

Michelle Mills Strout

Fabio Luporini

Christopher D. Krieger

Catherine Olschanowsky

J. Ramanujam