2021

SHMEM-ML: Leveraging OpenSHMEM and Apache Arrow for Scalable, Composable Machine Learning.

[DOI]

,

,

Howard Pritchard

,

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021

2020

Smoky Mountain Data Challenge 2020: An Open Call to Solve Data Problems in the Areas of Neutron Science, Material Science, Urban Modeling and Dynamics, Geophysics, and Biomedical Informatics.

[DOI]

Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Integrating Inter-Node Communication with a Resilient Asynchronous Many-Task Runtime System.

[DOI]

,

Akihiro Hayashi

,

Matthew Whitlock

,

,

Keita Teranishi

,

Jackson R. Mayo

,

,

Proceedings of the Workshop on Exascale MPI, 2020

HOOVER: Leveraging OpenSHMEM for High Performance, Flexible Streaming Graph Applications.

[DOI]

,

Howard Pritchard

,

,

Proceedings of the 3rd IEEE/ACM Annual Parallel Applications Workshop: Alternatives To MPI+X, 2020

2018

Data-parallel distributed training of very large models beyond GPU capacity.

[DOI]

,

,

,

,

,

CoRR, 2018

A One Year Retrospective on a MOOC in Parallel, Concurrent, and Distributed Programming in Java.

[DOI]

,

,

,

Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

A Unified Runtime for PGAS and Event-Driven Programming.

[DOI]

,

,

Akihiro Hayashi

,

,

,

,

Proceedings of the 4th International Workshop on Extreme Scale Programming Models and Middleware, 2018

HOOVER: Distributed, Flexible, and Scalable Streaming Graph Processing on OpenSHMEM.

[DOI]

,

Howard Pritchard

,

,

Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

S2FA: an accelerator automation framework for heterogeneous computing in datacenters.

[DOI]

,

,

,

,

,

Proceedings of the 55th Annual Design Automation Conference, 2018

2017

Deadlock avoidance in parallel programs with futures: why parallel tasks should not wait for strangers.

[DOI]

Tiago Cogumbreiro

,

Rishi Surendran

,

Francisco Martins

,

,

Vasco T. Vasconcelos

,

Proc. ACM Program. Lang., 2017

Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level.

[DOI]

,

,

,

,

,

J. Parallel Distributed Comput., 2017

Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages.

[DOI]

Akihiro Hayashi

,

,

,

,

Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments.

[DOI]

,

Howard Pritchard

,

,

Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

Implementation and Evaluation of OpenSHMEM Contexts Using OFI Libfabric.

[DOI]

,

,

,

Howard Pritchard

,

,

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Preparing an Online Java Parallel Computing Course.

[DOI]

,

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

A Pluggable Framework for Composable HPC Scheduling Libraries.

[DOI]

,

,

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016

HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications.

[DOI]

,

Maurício Breternitz Jr.

,

IEEE Trans. Parallel Distributed Syst., 2016

A survey of sparse matrix-vector multiplication performance on large matrices.

[DOI]

,

Christopher Thiele

,

Mauricio Araya-Polo

,

,

,

CoRR, 2016

Static Cost Estimation for Data Layout Selection on GPUs.

[DOI]

,

,

Proceedings of the 7th International Workshop on Performance Modeling, 2016

Integrating Asynchronous Task Parallelism with OpenSHMEM.

[DOI]

,

,

,

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

OpenMP as a High-Level Specification Language for Parallelism - And its use in Evaluating Parallel Programming Systems.

[DOI]

,

,

Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Efficient Checkpointing of Multi-threaded Applications as a Tool for Debugging, Performance Tuning, and Resiliency.

[DOI]

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform.

[DOI]

,

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015

Auto-grading for parallel programs.

[DOI]

,

,

,

,

Proceedings of the Workshop on Education for High-Performance Computing, 2015

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators.

[DOI]

,

,

Proceedings of the Principles and Practices of Programming on The Java Platform, 2015

2013

Accelerating Habanero-Java programs with OpenCL generation.

[DOI]

Akihiro Hayashi

,

,

,

,

Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, 2013

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs.

[DOI]

Akihiro Hayashi

,

,

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 2013

HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL.

[DOI]

,

Maurício Breternitz Jr.

,

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Integrating Asynchronous Task Parallelism with MPI.

[DOI]

Sanjay Chatterjee

,

Sagnak Tasirlar

,

,

,

,

,

,

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms.

[DOI]

,

Rajkishore Barik

,

,

,

Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

2011

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System.

[DOI]

Sanjay Chatterjee

,

,

Alina Simion Sbîrlea

,

Proceedings of the Languages and Compilers for Parallel Computing, 2011

2010

CnC-CUDA: Declarative Programming for GPUs.

[DOI]

,

Alina Simion Sbîrlea

,

,

Proceedings of the Languages and Compilers for Parallel Computing, 2010

2009

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA.

[DOI]

,

,

Proceedings of the Euro-Par 2009 Parallel Processing, 2009