Extending LLVM IR for DPC++ Matrix Support: A Case Study with Intel<sup>®</sup> Advanced Matrix Extensions (Intel<sup>®</sup> AMX).
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021
A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge.
CoRR, 2018
Assessing One-to-One Parallelism Levels Mapping for OpenMP Offloading to GPUs.
Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, 2017
Towards Automatic HBM Allocation Using LLVM: A Case Study with Knights Landing.
Proceedings of the Third Workshop on the LLVM Compiler Infrastructure in HPC, 2016
Increasing Computational Asynchrony in OpenSHMEM with Active Messages.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016
Optimizing GPU Register Usage: Extensions to OpenACC and Compiler Optimizations.
Proceedings of the 45th International Conference on Parallel Processing, 2016
A Comparative Survey of the HPC and Big Data Paradigms: Analysis and Experiments.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems.
Parallel Comput., 2015
LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications.
Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015
Extending the Strided Communication Interface in OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015
OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
A Team-Based Methodology of Memory Hierarchy-Aware Runtime Support in Coarray Fortran.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Native Mode-Based Optimizations of Remote Memory Accesses in OpenSHMEM for Intel Xeon Phi.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014
Automatic Resource-Constrained Static Task Parallelization : A Generic Approach. (Parallélisation automatique et statique de tâches sous contraintes de ressources : une approche générique).
PhD thesis, 2013
Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages.
Proceedings of the Languages and Compilers for Parallel Computing, 2012