Manjunath Gorentla Venkata

Orcid: 0000-0002-5282-1682

According to our database1, Manjunath Gorentla Venkata authored at least 55 papers between 2006 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Unified Collective Communication (UCC): An Unified Library for CPU, GPU, and DPU Collectives.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2023
OpenSHMEM Queues: An abstraction for enhancing message rate, bandwidth utilization, and reducing tail latency in OpenSHMEM Applications.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2021
Hot Interconnects 27.
IEEE Micro, 2021

2020
A survey of MPI usage in the US exascale computing project.
Concurr. Comput. Pract. Exp., 2020

2019
Accelerating OpenSHMEM Collectives Using In-Network Computing Approach.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

2018
SharP Data Constructs: Data Constructs to Enable Data-Centric Computing.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Oak Ridge OpenSHMEM Benchmark Suite.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Designing High-Performance In-Memory Key-Value Operations with Persistent GPU Kernels and OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Tracking Memory Usage in OpenSHMEM Runtimes with the TAU Performance System.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

An Initial Implementation of Libfabric Conduit for OpenSHMEM-X.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

OpenSHMEM Sets and Groups: An Approach to Worksharing and Memory Management.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

SharP Unified Memory Allocator: An Intent-Based Memory Allocator for Extreme-Scale Systems.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

SHMEMGraph: Efficient and Balanced Graph Processing Using One-Sided Communication.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

2017
Efficient Breadth First Search on Multi-GPU Systems Using GPU-Centric OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Performance Analysis of OpenSHMEM Applications with TAU Commander.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Portable SHMEMCache: A High-Performance Key-Value Store on OpenSHMEM and MPI.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Evaluating Contexts in OpenSHMEM-X Reference Implementation.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Merged Requests for Better Performance and Productivity in Multithreaded OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Parallelizing Single Source Shortest Path with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

SharP: Towards Programming Extreme-Scale Systems with Hierarchical Heterogeneous Memory.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

GPU-Centric Communication on NVIDIA GPU Clusters with InfiniBand: A Case Study with OpenSHMEM.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

SharP Hash: A High-Performing Distributed Hash for Extreme-Scale Systems.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

High-Performance Key-Value Store On OpenSHMEM.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
A hybrid computational strategy to address WGS variant analysis in >5000 samples.
BMC Bioinform., 2016

DISP: Optimizations towards Scalable MPI Startup.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

On Synchronisation and Memory Reuse in OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Investigating Data Motion Power Trends to Enable Power-Efficient OpenSHMEM Implementations.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Profiling Production OpenSHMEM Applications.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

SHMemCache: Enabling Memcached on the OpenSHMEM Global Address Model.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Surviving Errors with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Evaluating OpenSHMEM Explicit Remote Memory Access Operations and Merged Requests.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

OpenSHMEM-UCX: Evaluation of UCX for Implementing OpenSHMEM Programming Model.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

2015
From MPI to OpenSHMEM: Porting LAMMPS.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Exploring OpenSHMEM Model to Program GPU-based Extreme-Scale Systems.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

An Evaluation of OpenSHMEM Interfaces for the Variable-Length Alltoallv() Collective Operation.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Parallelizing the Smith-Waterman Algorithm Using OpenSHMEM and MPI-3 One-Sided Interfaces.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015


Fast Fault Injection and Sensitivity Analysis for Collective Communications.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Development and Extension of Atomic Memory Operations in OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

OpenSHMEM Reference Implementation using UCCS-uGNI Transport Layer.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Fault Tolerance for OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing a High Performance OpenSHMEM Implementation Using Universal Common Communication Substrate as a Communication Middleware.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

OpenSHMEM Extensions and a Vision for Its Future Direction.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

2013
Optimizing blocking and nonblocking reduction operations for multicore systems: Hierarchical design and implementation.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

SLOAVx: Scalable LOgarithmic AlltoallV Algorithm for Hierarchical Multicore Systems.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Exploiting Atomic Operations for Barrier on Cray XE/XK Systems.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exploring the All-to-All Collective Optimization Space with ConnectX CORE-Direct.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Performance Evaluation of Open MPI on Cray XE/XK Systems.
Proceedings of the IEEE 20th Annual Symposium on High-Performance Interconnects, 2012

Assessing the Performance and Scalability of a Novel Multilevel K-Nomial Allgather on CORE-Direct Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
ConnectX-2 CORE-Direct Enabled Asynchronous Broadcast Collective Communications.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Analyzing the Effects of Multicore Architectures and On-Host Communication Characteristics on Collective Communications.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011

Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Cheetah: A Framework for Scalable Hierarchical Collective Operations.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2009
Using application communication characteristics to drive dynamic MPI reconfiguration.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2006
MPI/CTP: A Reconfigurable MPI for HPC Applications.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006


  Loading...