Benjamin Klenk

Larry Dennison

Proceedings of the Optical Fiber Communications Conference and Exhibition, 2020

An In-Network Architecture for Accelerating Shared-Memory Multiprocessor Collectives.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2018

Communication architectures for scalable GPU-centric computing systems.

[BibT_eX]

[DOI]

PhD thesis, 2018

2017

An Overview of MPI Characteristics of Exascale Proxy Applications.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

2016

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy.

[BibT_eX]

[DOI]

Parallel Comput., 2016

2015

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

2014

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014

Analyzing Put/Get APIs for Thread-Collaborative Processors.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs.

[BibT_eX]

[DOI]