Devendar Bureddy

According to our database1, Devendar Bureddy authored at least 14 papers between 2011 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2020
Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)<sup>TM</sup> Streaming-Aggregation Hardware Design and Evaluation.
Proceedings of the High Performance Computing - 35th International Conference, 2020

2017
Towards A Data Centric System Architecture: SHARP.
Supercomput. Front. Innov., 2017

2016
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

2014
GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation.
IEEE Trans. Parallel Distributed Syst., 2014

2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.
Proceedings of the International Conference for High Performance Computing, 2013

Extending OpenSHMEM for GPU Computing.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Design of network topology aware scheduling services for large InfiniBand clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Efficient Intra-node Communication on Intel-MIC Clusters.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

2011
Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand.
Proceedings of the Recent Advances in the Message Passing Interface, 2011


  Loading...