Gheorghe-Teodor Bercea

Orcid: 0000-0003-4331-4360

According to our database1, Gheorghe-Teodor Bercea authored at least 23 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Porting HPC Applications to AMD InstinctTM MI300A Using Unified Memory and OpenMP.
CoRR, 2024

Porting HPC Applications to AMD Instinct™ MI300A using Unified Memory and OpenMP®.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

2023
Reliable Actors with Retry Orchestration.
Proc. ACM Program. Lang., 2023

Specialized Kernels for Optimizing GPU Offload in OpenMP.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022
The Good, the Bad, and the Outliers: A Testing Framework for Decision Optimization Model Learning.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

2020
Hybrid CPU/GPU tasks optimized for concurrency in OpenMP.
IBM J. Res. Dev., 2020

An open-source solution to performance portability for Summit and Sierra supercomputers.
IBM J. Res. Dev., 2020

Compiling ONNX Neural Network Models Using MLIR.
CoRR, 2020

2019
Sublinear Subwindow Search.
CoRR, 2019

2017
Improving high performance computing using code generation and compilation techniques.
PhD thesis, 2017

Firedrake: Automating the Finite Element Method by Composing Abstractions.
ACM Trans. Math. Softw., 2017

Implementing implicit OpenMP data sharing on GPUs.
Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC, 2017

Efficient Fork-Join on GPUs Through Warp Specialization.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016
Automated Generation and Symbolic Manipulation of Tensor Product Finite Elements.
SIAM J. Sci. Comput., 2016

A numbering algorithm for finite element on extruded meshes which avoids the unstructured mesh penalty.
CoRR, 2016

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support.
Proceedings of the 7th International Workshop on Performance Modeling, 2016

Offloading Support for OpenMP in Clang and LLVM.
Proceedings of the Third Workshop on the LLVM Compiler Infrastructure in HPC, 2016


2015
Integrating GPU support for OpenMP offloading directives into Clang.
Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

Performance analysis of OpenMP on a GPU using a CORAL proxy application.
Proceedings of the 6th International Workshop on Performance Modeling, 2015

2014
Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.
ACM Trans. Archit. Code Optim., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.
CoRR, 2014

Generalizing Run-Time Tiling with the Loop Chain Abstraction.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014


  Loading...