Kewei Yan

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

2023

Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks.

[DOI]

Kewei Yan

Yaying Shi

Proceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores, 2023

2022

Generating and Analyzing Program Call Graphs using Ontology.

[DOI]

Ethan Dorta

Proceedings of the IEEE/ACM Workshop on Programming and Performance Visualization Tools, 2022

Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures.

[DOI]

Patrick J. Flynn

Xinyao Yi

Proceedings of the PMAM@PPoPP 2022: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores, Virtual Event / Seoul, Republic of Korea, April 2, 2022

Stacking Feature Maps of Multi-scaled Medical Images in U-Net for 3D Head and Neck Tumor Segmentation.

[DOI]

Yaying Shi

Xiaodong Zhang

Proceedings of the Head and Neck Tumor Segmentation and Outcome Prediction, 2022

Applying Quadratic Penalty Method for Intensity-Based Deformable Image Registration on BraTS-Reg Challenge 2022.

[DOI]

Kewei Yan

Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2022

Experimenting FedML and NVFLARE for Federated Tumor Segmentation Challenge.

[DOI]

Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2022

UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models.

[DOI]

Xinyao Yi

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021

Extending OpenMP for Machine Learning-Driven Adaptation.

[DOI]

Giorgis Georgakoudis

David Beckingsale

Todd Gamblin

Proceedings of the Accelerator Programming Using Directives - 8th International Workshop, 2021

RDS: a cloud-based metaservice for detecting data races in parallel programs.

[DOI]

Proceedings of the UCC '21: 2021 IEEE/ACM 14th International Conference on Utility and Cloud Computing, Leicester, United Kingdom, December 6, 2021

CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming.

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

An Ensemble Approach to Automatic Brain Tumor Segmentation.

[DOI]

Proceedings of the Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, 2021

2020

Enhancing DataRaceBench for Evaluating Data Race Detection Tools.

[DOI]

Proceedings of the 4th IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2020

Extending FreeCompilerCamp.org as an Online Self-Learning Platform for Compiler Development.

[DOI]

Proceedings of the IEEE/ACM Workshop on Education for High-Performance Computing, 2020

Supporting Data Shuffle Between Threads in OpenMP.

[DOI]

Xinyao Yi

Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

2019

Extending OpenMP Map Clause to Bridge Storage and Device Memory.

[DOI]

Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

Ompparser: A Standalone and Unified OpenMP Parser.

[DOI]

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

Extending OpenMP Metadirective Semantics for Runtime Adaptation.

[DOI]

Thomas R. W. Scogland

Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

2018

A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge.

[DOI]

CoRR, 2018

2017

Principles of Memory-Centric Programming for High Performance Computing.

[DOI]

Ron Brightwell

Xian-He Sun

Proceedings of the Workshop on Memory Centric Programming for HPC, 2017

Evaluation of Knight Landing High Bandwidth Memory for HPC Workloads.

[DOI]

Solmaz Salehian

Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, 2017

HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems.

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Comparison of Threading Programming Models.

[DOI]

Solmaz Salehian

Jiawen Liu

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016

Compiler transformation of nested loops for general purpose GPUs.

[DOI]

Xiaonan Tian

Rengan Xu

Deepak Eachempati

Concurr. Comput. Pract. Exp., 2016

A Proposal to OpenMP for Addressing the CPU Oversubscription Challenge.

[DOI]

Jeff R. Hammond

Alexandre E. Eichenberger

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Comparison of Spark Resource Managers and Distributed File Systems.

[DOI]

Soulmaz Salehian

Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), 2016

2015

Programming Models, Languages, and Compilers for Manycore and Heterogeneous Architectures.

[DOI]

Xinmin Tian

Sci. Program., 2015

Supporting multiple accelerators in high-level programming models.

[DOI]

Pei-Hung Lin

Daniel J. Quinlan

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

2014

Reduction Operations in Parallel Loops for GPGPUs.

[DOI]

Rengan Xu

Xiaonan Tian

Proceedings of the 2014 PPOPP International Workshop on Programming Models and Applications for Multicores and Manycores, 2014

NAS Parallel Benchmarks for GPGPUs Using a Directive-Based Programming Model.

[DOI]

Rengan Xu

Xiaonan Tian

Proceedings of the Languages and Compilers for Parallel Computing, 2014

Predicting Cache Contention for Multithread Applications at Compile Time.

[DOI]

Munara Tolubaeva

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

2013

Compile Time Modeling of Off-Chip Memory Bandwidth for Parallel Loops.

[DOI]

Munara Tolubaeva

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Compiling a High-Level Directive-Based Programming Model for GPGPUs.

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Early Experiences with the OpenMP Accelerator Model.

[DOI]

Daniel J. Quinlan

Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

A Prototype Implementation of OpenMP Task Dependency Support.

[DOI]

Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

Integrating Asynchronous Task Parallelism with MPI.

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012

Performance and Power Characteristics of Matrix Multiplication Algorithms on Multicore and Shared Memory Machines.

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Compile-Time Detection of False Sharing via Loop Cost Modeling.

[DOI]

Munara Tolubaeva

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Acceleration of bulk memory operations in a heterogeneous multicore architecture.

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Integrating MPI with Asynchronous Task Parallelism.

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures.

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2009

The habanero multicore software research project.

[DOI]

Proceedings of the Companion to the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2009

Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement.

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2009

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA.

[DOI]

Max Grossman

Vivek Sarkar

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2007

Scientific workflow scheduling in computational grids - Planning, reservation, and data/network-awareness.

[DOI]

Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), 2007

2006

Campus Grids Meet Applications: Modeling, Metascheduling and Integration.

[DOI]

J. Grid Comput., 2006

A Feature-Rich Workflow Description Language that Supports Resource Co-allocations.