Ahmad Afsahi

According to our database1, Ahmad Afsahi authored at least 70 papers between 1997 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
ROCm-Aware Leader-based Designs for MPI Neighbourhood Collectives.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

A Topology- and Load-Aware Design for Neighborhood Allgather.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
A Dynamic Network-Native MPI Partitioned Aggregation Over InfiniBand Verbs.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
Accelerating Deep Learning Using Interconnect-Aware UCX Communication for MPI Collectives.
IEEE Micro, 2022

Efficient Process Arrival Pattern Aware Collective Communication for Deep Learning.
Proceedings of the EuroMPI/USA'22: 29th European MPI Users' Group Meeting, Chattanooga, TN, USA, September 26, 2022

Micro-Benchmarking MPI Partitioned Point-to-Point Communication.
Proceedings of the 51st International Conference on Parallel Processing, 2022

2021
Efficient Multi-Path NVLink/PCIe-Aware UCX based Collective Communication for Deep Learning.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2021

2020
Communication-aware message matching in MPI.
Concurr. Comput. Pract. Exp., 2020

2019
A dynamic, unified design for dedicated message matching engines for collective and point-to-point communications.
Parallel Comput., 2019

An Efficient Collaborative Communication Mechanism for MPI Neighborhood Collectives.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Fuzzy Matching: Hardware Accelerated MPI Communication Middleware.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

2018
Design considerations for GPU-aware collective communications in MPI.
Concurr. Comput. Pract. Exp., 2018

A Dedicated Message Matching Mechanism for Collective Communications.
Proceedings of the 47th International Conference on Parallel Processing, 2018

The Case for Semi-Permanent Cache Occupancy: Understanding the Impact of Data Locality on Network Processing.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
Exploiting heterogeneity of communication channels for efficient GPU selection on multi-GPU nodes.
Parallel Comput., 2017

Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

2016
MAGC: A Mapping Approach for GPU Clusters.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Topology-Aware Rank Reordering for MPI Collectives.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

PTRAM: A Parallel Topology-and Routing-Aware Mapping Framework for Large-Scale HPC Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Topology-Aware GPU Selection on Multi-GPU Nodes.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015
Scalable Network Communication Using Unreliable RDMA.
Proceedings of the Handbook on Data Centers, 2015

Scalable connectionless RDMA over unreliable datagrams.
Parallel Comput., 2015

Hyper-Q aware intranode MPI collectives on the GPU.
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, 2015

2014
Extreme-scale computing services over MPI: Experiences, observations and features proposal for next-generation message passing interface.
Int. J. High Perform. Comput. Appl., 2014

A fast and resource-conscious MPI message queue mechanism for large-scale jobs.
Future Gener. Comput. Syst., 2014

Nonblocking Epochs in MPI One-Sided Communication.
Proceedings of the International Conference for High Performance Computing, 2014

Intra-Epoch Message Scheduling To Exploit Unused or Residual Overlapping Potential.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

GPU-Aware Intranode MPI_Allreduce.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

2013
Using MPI in high-performance computing services.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Mercury: Enabling remote procedure call for high-performance computing.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Toward Asynchronous and MPI-Interoperable Active Messages.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
An Efficient MPI Message Queue Mechanism for Large-scale Jobs.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

A study of hardware performance monitoring counter selection in power modeling of computing systems.
Proceedings of the 2012 International Green Computing Conference, 2012

Designing an Offloaded Nonblocking MPI_Allgather Collective Using CORE-Direct.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

2011
Process Arrival Pattern Aware Alltoall and Allgather on InfiniBand Clusters.
Int. J. Parallel Program., 2011

Exploiting application buffer reuse to improve MPI small message transfer protocols over RDMA-enabled networks.
Clust. Comput., 2011

Multi-core and Network Aware MPI Topology Functions.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

RDMA Capable iWARP over Datagrams.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Investigating Scenario-Conscious Asynchronous Rendezvous over RDMA.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Adaptive estimation and prediction of power and performance in high performance computing.
Comput. Sci. Res. Dev., 2010

A study of hardware assisted IP over InfiniBand and its impact on enterprise data center performance.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

iWARP redefined: Scalable connectionless communication over high-speed Ethernet.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

2009
A Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enabled Interconnects.
Int. J. Parallel Program., 2009

Improving energy efficiency of asymmetric chip multithreaded multiprocessors through reduced OS noise scheduling.
Concurr. Comput. Pract. Exp., 2009

Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Improving RDMA-based MPI eager protocol for frequently-used buffers.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

2008
Efficient shared memory and RDMA based collectives on multi-rail QsNet<sup>II</sup> SMP clusters.
Clust. Comput., 2008

An Analysis of QoS Provisioning for Sockets Direct Protocol vs. IPoIB over Modern InfiniBand Networks.
Proceedings of the 37th International Conference on Parallel Processing, 2008

Improving Communication Progress and Overlap in MPI Rendezvous Protocol over RDMA-enabled Interconnects.
Proceedings of the 22nd Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2008), 2008

2007
10-Gigabit iWARP Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Comprehensive Analysis of OpenMP Applications on Dual-Core Intel Xeon SMPs.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

RDMA-based and SMP-aware Multi-port All-Gather on Multi-rail QsNet^II SMP Clusters.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

High Performance RDMA-based Multi-port All-gather on Multi-rail QsNet II.
Proceedings of the 21st Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2007), 2007

Assessing the Ability of Computation/Communication Overlap and Communication Progress in Modern Interconnects.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

A feasibility analysis of power-awareness and energy minimization in modern interconnects for high-performance computing.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Improving system efficiency through scheduling and power management.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Performance evaluation of the Sun Fire Link SMP clusters.
Int. J. High Perform. Comput. Netw., 2006

Efficient RDMA-based multi-port collectives on multi-rail QsNet<sup>II</sup> clusters.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Power-performance efficiency of asymmetric multiprocessors for multi-threaded scientific applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Communication Characteristics of Message-Passing Scientific and Engineering Applications.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2005

2004
Myrinet Networks: A Performance Study.
Proceedings of the 3rd IEEE International Symposium on Network Computing and Applications (NCA 2004), 30 August, 2004

2003
Performance characteristics of openMP constructs, and application benchmarks on a large symmetric multiprocessor.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003

2002
Analysis of a Latency Hiding Broadcasting Algorithm on a Reconfigurable Optical Interconnect.
Parallel Process. Lett., 2002

Efficient communication using message prediction for clusters of multiprocessors.
Concurr. Comput. Pract. Exp., 2002

Architectural Extensions to Support Efficient Communication Using Message Prediction.
Proceedings of the 16th Annual International Symposium on High Performance Computing Systems and Applications, 2002

2000
Efficient Communication Using Message Prediction for Cluster Multiprocessors.
Proceedings of the Network-Based Parallel Computing: Communication, 2000

1999
Hiding Communication Latency in Reconfigurable Message-Passing Environments.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1998
Communications Latency Hiding Techniques for a Reconfigurable Optical Interconnect: Benchmark Studies.
Proceedings of the Applied Parallel Computing, 1998

1997
Collective Communications on a Reconfigurable Optical Interconnect.
Proceedings of the On Principles Of Distributed Systems, 1997


  Loading...