Amith R. Mamidala

According to our database1, Amith R. Mamidala authored at least 33 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Collage: Light-Weight Low-Precision Strategy for LLM Training.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2018
Efficient Embedding of MPI Collectives in MXNET DAGs for scaling Deep Learning.
CoRR, 2018

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning.
CoRR, 2018

2016
Optimization of Message Passing Services on POWER8 InfiniBand Clusters.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

2014
Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer.
Int. J. High Perform. Comput. Appl., 2014

2013
IBM Blue Gene/Q system software stack.
IBM J. Res. Dev., 2013

2012
Looking under the hood of the IBM blue gene/Q network.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

2011
Optimizing MPI Collectives Using Efficient Intra-node Communication Techniques over the Blue Gene/P Supercomputer.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010
Architecture of the Component Collective Messaging Interface.
Int. J. High Perform. Comput. Appl., 2010

2009
Topology agnostic hot-spot avoidance with InfiniBand.
Concurr. Comput. Pract. Exp., 2009

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations.
Proceedings of the 23rd international conference on Supercomputing, 2009

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand.
Proceedings of the ICPP 2009, 2009

MPI Collective Communications on The Blue Gene/P Supercomputer: Algorithms and Optimizations.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Scaling alltoall collective on multi-core systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Automatic Path Migration over InfiniBand: Early Experiences.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High Performance MPI over iWARP: Early Experiences.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006
Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Evaluating InfiniBand Performance with PCI Express.
IEEE Micro, 2005

Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters.
Proceedings of the High Performance Computing, 2005

2004
Fast and Scalable MPI-Level Broadcast Using InfiniBand?s Hardware Multicast Support.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Performance evaluation of InfiniBand with PCI Express.
Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004


  Loading...