Akshay Venkatesh
Orcid: 0009-0006-1010-6476
According to our database1,
Akshay Venkatesh
authored at least 34 papers
between 2002 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on id.loc.gov
-
on d-nb.info
On csauthors.net:
Bibliography
2024
ACM Trans. Graph., July, 2024
2022
2019
2017
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling.
Proceedings of the 46th International Conference on Parallel Processing, 2017
2016
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters.
Parallel Comput., 2016
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016
2015
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.
Proceedings of the High Performance Computing - 30th International Conference, 2015
Proceedings of the International Conference for High Performance Computing, 2015
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015
Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.
Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015
Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015
2014
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.
Proceedings of the Supercomputing - 29th International Conference, 2014
A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.
Proceedings of the 21st International Conference on High Performance Computing, 2014
2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.
Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013
2012
MPI-based parallel synchronous vector evaluated particle swarm optimization for multi-objective design optimization of composite structures.
Eng. Appl. Artif. Intell., 2012
Proceedings of the Recent Advances in the Message Passing Interface, 2012
2002