Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC.

[DOI]

Assefaw Hadish Gebremedhin

Parallel Comput., 2016

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL.

[DOI]

Parallel Comput., 2016

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.

[DOI]

Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

One-Sided Interface for Matrix Operations Using MPI-3 RMA: A Case Study with Elemental.

[DOI]

Barbara M. Chapman

Proceedings of the 45th International Conference on Parallel Processing, 2016

A Review of Lightweight Thread Approaches for High Performance Computing.

[DOI]

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015

Improving the user experience of the rCUDA remote GPU virtualization framework.

[DOI]

Concurr. Comput. Pract. Exp., 2015

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery.

[DOI]

Wesley Bland

Proceedings of the International Conference for High Performance Computing, 2015

Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures.

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA.

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA.

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Toward Implementing Robust Support for Portals 4 Networks in MPICH.

[DOI]

Ken Raffenetti

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Understanding Data Access Patterns Using Object-Differentiated Memory Profiling.

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

A complete and efficient CUDA-sharing solution for HPC clusters.

[DOI]

Parallel Comput., 2014

MT-MPI: multithreaded MPI for many-core environments.

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

A Framework for Tracking Memory Accesses in Scientific Applications.

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

Toward the efficient use of multiple explicitly managed memory subsystems.

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

Analysis of topology-dependent MPI performance on Gemini networks.

[DOI]

Ralf G. Correa Carvalho

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Influence of InfiniBand FDR on the performance of remote GPU virtualization.

[DOI]

Carlos Reaño

Rafael Mayo

Federico Silla

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Evaluation of Inter- and Intra-node Data Transfer Efficiencies between GPU Devices and their Impact on Scalable Applications.

[DOI]

Sadaf R. Alam

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

CU2rCU: Towards the complete rCUDA remote GPU virtualization and sharing solution.

[DOI]

Proceedings of the 19th International Conference on High Performance Computing, 2012

2011

Performance of CUDA Virtualized Remote GPUs in High Performance Clusters.

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Enabling CUDA acceleration within virtual machines using rCUDA.

[DOI]

Federico Silla

Juan Carlos Fernández

Rafael Mayo

Proceedings of the 18th International Conference on High Performance Computing, 2011

2010

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters.

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing & Simulation, 2010

2009

An Efficient Implementation of GPU Virtualization in High Performance Clusters.

[DOI]