Unified Collective Communication: A Unified Library for CPU, GPU, and DPU Collectives.
IEEE Micro, 2025
Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI.
Proceedings of the International Conference for High Performance Computing, 2024
Offloaded MPI message matching: an optimistic approach.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
Protocol Buffer Deserialization DPU Offloading in the RPC Datapath.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
NVIDIA's Cloud Native Supercomputing.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)<sup>TM</sup> Streaming-Aggregation Hardware Design and Evaluation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the High Performance Computing - 35th International Conference, 2020
Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019
Towards A Data Centric System Architecture: SHARP.
Supercomput. Front. Innov., 2017
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016
Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010
Investigations on InfiniBand: Efficient Network Buffer Utilization at Scale.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007