Gilad Shainer

According to our database1, Gilad Shainer authored at least 33 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Optimizing Application Performance with BlueField: Accelerating Large-Message Blocking and Nonblocking Collective Operations.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2022
NVIDIA's Quantum InfiniBand Network Congestion Control Technology and Its Impact on Application Performance.
Proceedings of the High Performance Computing - 37th International Conference, 2022

2021
NVIDIA's Cloud Native Supercomputing.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

2020
The high-speed networks of the Summit and Sierra supercomputers.
IBM J. Res. Dev., 2020

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)<sup>TM</sup> Streaming-Aggregation Hardware Design and Evaluation.
Proceedings of the High Performance Computing - 35th International Conference, 2020

2019
Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

2018
LRUM: Local Reliability Protocol for Unreliable Hardware Multicast.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2018

2017
Towards A Data Centric System Architecture: SHARP.
Supercomput. Front. Innov., 2017

Enabling One-Sided Communication Semantics on ARM.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
Scalable Hierarchical Aggregation Protocol (SHArP): A Hardware Architecture for Efficient Data Reduction.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

Using InfiniBand Hardware Gather-Scatter Capabilities to Optimize MPI All-to-All.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

2015
Local and Remote GPUs Perform Similar with EDR 100G InfiniBand.
Proceedings of the Industrial Track of the 16th International Middleware Conference, 2015


2014
Development and Extension of Atomic Memory Operations in OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Boosting the performance of remote GPU virtualization using InfiniBand connect-IB and PCIe 3.0.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
The co-design architecture for exascale systems, a novel approach for scalable designs.
Comput. Sci. Res. Dev., 2013

Maximizing Application Performance in a Multi-core, NUMA-Aware Compute Cluster by Multi-level Tuning.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

2012
Exploring the Scope of the InfiniBand Congestion Control Mechanism.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
The development of Mellanox/NVIDIA GPUDirect over InfiniBand - a new model for GPU to GPU communications.
Comput. Sci. Res. Dev., 2011

The development of Mellanox/NVIDIA GPUDirect over InfiniBand: a new model for GPU to GPU communications.
Proceedings of the 2011 TeraGrid Conference - Extreme Digital Discovery, 2011

ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

The ParaPhrase Project: Parallel Patterns for Adaptive Heterogeneous Multicore Systems.
Proceedings of the Formal Methods for Components and Objects, 10th International Symposium, 2011

On the Relation between Congestion Control, Switch Arbitration and Fairness.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Cheetah: A Framework for Scalable Hierarchical Collective Operations.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-<i>Direct</i> Capabilities.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

First experiences with congestion control in InfiniBand hardware.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Optics for Enabling Future HPC Systems.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Scheduling strategies for HPC as a service (HPCaaS).
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2006
Multi-core usage - Multi-core clusters usage model.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Architecture and Implementation of Sockets Direct Protocol in Windows.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006


  Loading...