2021
SHMEM-ML: Leveraging OpenSHMEM and Apache Arrow for Scalable, Composable Machine Learning.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021
2020
Smoky Mountain Data Challenge 2020: An Open Call to Solve Data Problems in the Areas of Neutron Science, Material Science, Urban Modeling and Dynamics, Geophysics, and Biomedical Informatics.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020
Integrating Inter-Node Communication with a Resilient Asynchronous Many-Task Runtime System.
Proceedings of the Workshop on Exascale MPI, 2020
HOOVER: Leveraging OpenSHMEM for High Performance, Flexible Streaming Graph Applications.
Proceedings of the 3rd IEEE/ACM Annual Parallel Applications Workshop: Alternatives To MPI+X, 2020
2018
Data-parallel distributed training of very large models beyond GPU capacity.
CoRR, 2018
A One Year Retrospective on a MOOC in Parallel, Concurrent, and Distributed Programming in Java.
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018
A Unified Runtime for PGAS and Event-Driven Programming.
Proceedings of the 4th International Workshop on Extreme Scale Programming Models and Middleware, 2018
HOOVER: Distributed, Flexible, and Scalable Streaming Graph Processing on OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018
S2FA: an accelerator automation framework for heterogeneous computing in datacenters.
Proceedings of the 55th Annual Design Automation Conference, 2018
2017
Deadlock avoidance in parallel programs with futures: why parallel tasks should not wait for strangers.
Proc. ACM Program. Lang., 2017
Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level.
J. Parallel Distributed Comput., 2017
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017
Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017
Implementation and Evaluation of OpenSHMEM Contexts Using OFI Libfabric.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017
Preparing an Online Java Parallel Computing Course.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
A Pluggable Framework for Composable HPC Scheduling Libraries.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
2016
HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications.
IEEE Trans. Parallel Distributed Syst., 2016
A survey of sparse matrix-vector multiplication performance on large matrices.
CoRR, 2016
Static Cost Estimation for Data Layout Selection on GPUs.
Proceedings of the 7th International Workshop on Performance Modeling, 2016
Integrating Asynchronous Task Parallelism with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016
OpenMP as a High-Level Specification Language for Parallelism - And its use in Evaluating Parallel Programming Systems.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Efficient Checkpointing of Multi-threaded Applications as a Tool for Debugging, Performance Tuning, and Resiliency.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
2015
Auto-grading for parallel programs.
Proceedings of the Workshop on Education for High-Performance Computing, 2015
HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators.
Proceedings of the Principles and Practices of Programming on The Java Platform, 2015
2013
Accelerating Habanero-Java programs with OpenCL generation.
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, 2013
Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2013
HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Integrating Asynchronous Task Parallelism with MPI.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Compiler-Driven Data Layout Transformation for Heterogeneous Platforms.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013
2011
Dynamic Task Parallelism with a GPU Work-Stealing Runtime System.
Proceedings of the Languages and Compilers for Parallel Computing, 2011
2010
CnC-CUDA: Declarative Programming for GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2010
2009
JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009