2024
Developing an Interactive OpenMP Programming Book with Large Language Models.
Proceedings of the Advancing OpenMP for Future Accelerators, 2024
RTune: Towards Automated and Coordinated Optimization of Computing and Computational Objectives of Parallel Iterative Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024
2023
Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks.
Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, 2023
2022
Generating and Analyzing Program Call Graphs using Ontology.
Proceedings of the IEEE/ACM Workshop on Programming and Performance Visualization Tools, 2022
Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures.
Proceedings of the PMAM@PPoPP 2022: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores, Virtual Event / Seoul, Republic of Korea, April 2, 2022
Stacking Feature Maps of Multi-scaled Medical Images in U-Net for 3D Head and Neck Tumor Segmentation.
Proceedings of the Head and Neck Tumor Segmentation and Outcome Prediction, 2022
Applying Quadratic Penalty Method for Intensity-Based Deformable Image Registration on BraTS-Reg Challenge 2022.
Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2022
Experimenting FedML and NVFLARE for Federated Tumor Segmentation Challenge.
Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2022
UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Extending OpenMP for Machine Learning-Driven Adaptation.
Proceedings of the Accelerator Programming Using Directives - 8th International Workshop, 2021
RDS: a cloud-based metaservice for detecting data races in parallel programs.
Proceedings of the UCC '21: 2021 IEEE/ACM 14th International Conference on Utility and Cloud Computing, Leicester, United Kingdom, December 6, 2021
CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
An Ensemble Approach to Automatic Brain Tumor Segmentation.
Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2021
2020
Enhancing DataRaceBench for Evaluating Data Race Detection Tools.
Proceedings of the 4th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2020
Extending FreeCompilerCamp.org as an Online Self-Learning Platform for Compiler Development.
Proceedings of the IEEE/ACM Workshop on Education for High-Performance Computing, 2020
Supporting Data Shuffle Between Threads in OpenMP.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
2019
Extending OpenMP Map Clause to Bridge Storage and Device Memory.
Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019
Ompparser: A Standalone and Unified OpenMP Parser.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019
Extending OpenMP Metadirective Semantics for Runtime Adaptation.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019
2018
A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge.
CoRR, 2018
2017
Principles of Memory-Centric Programming for High Performance Computing.
Proceedings of the Workshop on Memory Centric Programming for HPC, 2017
Evaluation of Knight Landing High Bandwidth Memory for HPC Workloads.
Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017
HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Comparison of Threading Programming Models.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
2016
Compiler transformation of nested loops for general purpose GPUs.
Concurr. Comput. Pract. Exp., 2016
A Proposal to OpenMP for Addressing the CPU Oversubscription Challenge.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Comparison of Spark Resource Managers and Distributed File Systems.
Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), 2016
2015
Programming Models, Languages, and Compilers for Manycore and Heterogeneous Architectures.
Sci. Program., 2015
Supporting multiple accelerators in high-level programming models.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015
2014
Reduction Operations in Parallel Loops for GPGPUs.
Proceedings of the 2014 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2014
NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2014
Predicting Cache Contention for Multithread Applications at Compile Time.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
2013
Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel Loops.
Proceedings of the Languages and Compilers for Parallel Computing, 2013
Compiling a High-Level Directive-Based Programming Model for GPGPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2013
Early Experiences with the OpenMP Accelerator Model.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013
A Prototype Implementation of OpenMP Task Dependency Support.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013
Integrating Asynchronous Task Parallelism with MPI.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
2012
Performance and Power Characteristics of Matrix Multiplication Algorithms on Multicore and Shared Memory Machines.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Compile-Time Detection of False Sharing via Loop Cost Modeling.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Acceleration of bulk memory operations in a heterogeneous multicore architecture.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Integrating MPI with Asynchronous Task Parallelism.
Proceedings of the Recent Advances in the Message Passing Interface, 2011
Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
2009
The habanero multicore software research project.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Companion to the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2009
Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement.
Proceedings of the Languages and Compilers for Parallel Computing, 2009
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
2007
Scientific workflow scheduling in computational grids - Planning, reservation, and data/network-awareness.
Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), 2007
2006
Campus Grids Meet Applications: Modeling, Metascheduling and Integration.
J. Grid Comput., 2006
A Feature-Rich Workflow Description Language that Supports Resource Co-allocations.
Proceedings of the High Performance Computing and Grids in Action, 2006
2003
An OGSI-compliant portal for campus grids.
Proceedings of the Enhanced Interoperable Systems. Proceedings of the 10th ISPE International Conference on Concurrent Engineering (ISPE CE 2003), 2003