2025

Unified Collective Communication: A Unified Library for CPU, GPU, and DPU Collectives.

[DOI]

Manjunath Gorentla Venkata

,

Valentine Petrov

,

,

Devendar Bureddy

,

Ferrol Aderholdt

,

,

,

,

IEEE Micro, 2025

2024

Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI.

[DOI]

Mikhail Khalilov

,

Salvatore Di Girolamo

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2024

Offloaded MPI message matching: an optimistic approach.

[DOI]

Jerónimo S. García

,

Salvatore Di Girolamo

,

,

J. J. Vegas Olmos

,

,

Torsten Hoefler

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Protocol Buffer Deserialization DPU Offloading in the RPC Datapath.

[DOI]

Raphaël Frantz

,

Jerónimo Sánchez García

,

,

Idelfonso Tafur Monroy

,

Juan José Vegas Olmos

,

,

Salvatore Di Girolamo

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.

[DOI]

Manjunath Gorentla Venkata

,

Valentine Petrov

,

,

Devendar Bureddy

,

Ferrol Aderholdt

,

,

,

,

Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2021

NVIDIA's Cloud Native Supercomputing.

[DOI]

,

Richard L. Graham

,

Chris J. Newburn

,

Oscar R. Hernandez

,

,

,

Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

2020

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)<sup>TM</sup> Streaming-Aggregation Hardware Design and Evaluation.

[DOI]

Richard L. Graham

,

,

Devendar Bureddy

,

,

,

,

,

,

,

,

,

Valentin Petrov

,

,

,

Proceedings of the High Performance Computing - 35th International Conference, 2020

2019

Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.

[DOI]

Manjunath Gorentla Venkata

,

,

,

Richard L. Graham

Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

2017

Towards A Data Centric System Architecture: SHARP.

[DOI]

Richard L. Graham

,

,

Devendar Bureddy

,

,

Supercomput. Front. Innov., 2017

2016

Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.

[DOI]

Richard L. Graham

,

Devendar Bureddy

,

,

,

,

,

Dror Goldenberg

,

,

Sasha Kotchubievsky

,

Vladimir Koushnir

,

,

,

,

Alexander Shpiner

,

,

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

2010

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities.

[DOI]

Richard L. Graham

,

Stephen W. Poole

,

,

,

,

,

,

,

Ishai Rabinovitz

,

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations.

[DOI]

Richard L. Graham

,

,

,

,

,

,

,

,

Ishai Rabinovitz

,

Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2007

Investigations on InfiniBand: Efficient Network Buffer Utilization at Scale.

[DOI]

Galen M. Shipman

,

,

,

Jeffrey M. Squyres

,

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007