Pavel Shamis

According to our database1, Pavel Shamis authored at least 34 papers between 2008 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Nemotron-4 340B Technical Report.
CoRR, 2024

2022
Bring the BitCODE-Moving Compute and Data in Distributed Heterogeneous Systems.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
UCX Programming Interface for Remote Function Injection and Invocation.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021

Two-Chains: High Performance Framework for Function Injection and Execution.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020
OpenSHMEM I/O Extensions for Fine-Grained Access to Persistent Memory Storage.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Using Arm Scalable Vector Extension to Optimize OPEN MPI.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
Breaking Band: A Breakdown of High-performance Communication.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
Distributed Task-Based Runtime Systems - Current State and Micro-Benchmark Performance.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018

2017
Enabling One-Sided Communication Semantics on ARM.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

2016
OpenSHMEM-UCX: Evaluation of UCX for Implementing OpenSHMEM Programming Model.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

2015
Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

An Evaluation of OpenSHMEM Interfaces for the Variable-Length Alltoallv() Collective Operation.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Check-Pointing Approach for Fault Tolerance in OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015


2014
Extending the OpenSHMEM Memory Model to Support User-Defined Spaces.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Development and Extension of Atomic Memory Operations in OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

OpenSHMEM Reference Implementation using UCCS-uGNI Transport Layer.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Fault Tolerance for OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing a High Performance OpenSHMEM Implementation Using Universal Common Communication Substrate as a Communication Middleware.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

OpenSHMEM Extensions and a Vision for Its Future Direction.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013
The co-design architecture for exascale systems, a novel approach for scalable designs.
Comput. Sci. Res. Dev., 2013

Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

2012
Exploiting Atomic Operations for Barrier on Cray XE/XK Systems.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analyzing the Effects of Multicore Architectures and On-Host Communication Characteristics on Collective Communications.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Cheetah: A Framework for Scalable Hierarchical Collective Operations.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Network Offloaded Hierarchical Collectives Using ConnectX-2's CORE-<i>Direct</i> Capabilities.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Designing high-performance and resilient message passing on InfiniBand.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Overlapping computation and communication: Barrier algorithms and ConnectX-2 CORE-Direct capabilities.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

ConnectX-2 InfiniBand Management Queues: First Investigation of the New Support for Network Offloaded Collective Operations.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2008
X-SRQ- Improving Scalability and Performance of Multi-core InfiniBand Clusters.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008


  Loading...