Hari Subramoni
Orcid: 0000-0002-1200-2754
According to our database1,
Hari Subramoni
authored at least 189 papers
between 2008 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer.
CoRR, 2024
Concurr. Comput. Pract. Exp., 2024
OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory Devices.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the 53rd International Conference on Parallel Processing, 2024
OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
Proceedings of the 24th IEEE International Symposium on Cluster, 2024
2023
J. Comput. Sci. Technol., February, 2023
IEEE Micro, 2023
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version.
CoRR, 2023
Proceedings of the Practice and Experience in Advanced Research Computing, 2023
Proceedings of the Practice and Experience in Advanced Research Computing, 2023
Proceedings of the High Performance Computing - 38th International Conference, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc<sup>*</sup>.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Performance Characterization of Using Quantization for DNN Inference on Edge Devices.
Proceedings of the 7th IEEE International Conference on Fog and Edge Computing, 2023
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023
Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023
Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training.
Proceedings of the IEEE International Conference on Big Data, 2023
Proceedings of the IEEE International Conference on Big Data, 2023
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2023
2022
IEEE Micro, 2022
Proceedings of the PEARC '22: Practice and Experience in Advanced Research Computing, Boston, MA, USA, July 10, 2022
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022
Proceedings of the High Performance Computing - 37th International Conference, 2022
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
2021
The MVAPICH project: Transforming research into high-performance MPI library for HPC community.
J. Comput. Sci., 2021
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters.
CoRR, 2021
Proceedings of the PEARC '21: Practice and Experience in Advanced Research Computing, 2021
Proceedings of the High Performance Computing - 36th International Conference, 2021
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs.
Proceedings of the High Performance Computing - 36th International Conference, 2021
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters<sup>*</sup>.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
2020
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects.
IEEE Micro, 2020
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures.
J. Parallel Distributed Comput., 2020
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Concurr. Comput. Pract. Exp., 2020
Proceedings of the PEARC '20: Practice and Experience in Advanced Research Computing, 2020
Proceedings of the High Performance Computing - 35th International Conference, 2020
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow.
Proceedings of the High Performance Computing - 35th International Conference, 2020
Proceedings of the Fourth IEEE/ACM Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2020
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training.
Proceedings of the International Conference for High Performance Computing, 2020
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR.
Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System.
Proceedings of the Workshop on Exascale MPI, 2020
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020
Proceedings of the IEEE International Conference on Cluster Computing, 2020
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020
2019
IEEE Trans. Parallel Distributed Syst., 2019
Parallel Comput., 2019
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2?
Parallel Comput., 2019
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow.
CoRR, 2019
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences.
Proceedings of the High Performance Computing, 2019
Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2.
Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019
Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast.
Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019
OMB-UM: Design, Implementation, and Evaluation of CUDA Unified Memory Aware MPI Benchmarks.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera.
Proceedings of the Third IEEE/ACM Workshop on Deep Learning on Supercomputers, 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter.
Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures.
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019
2018
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU.
Parallel Comput., 2018
Frontiers Inf. Technol. Electron. Eng., 2018
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 25th European MPI Users' Group Meeting, 2018
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures.
Proceedings of the 25th European MPI Users' Group Meeting, 2018
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Proceedings of the 25th European MPI Users' Group Meeting, 2018
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training.
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018
Proceedings of the IEEE International Conference on Cluster Computing, 2018
2017
Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication.
Proceedings of the High Performance Computing - 32nd International Conference, 2017
Proceedings of the International Conference for High Performance Computing, 2017
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures.
Proceedings of the Machine Learning on HPC Environments, 2017
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU.
Proceedings of the 24th European MPI Users' Group Meeting, 2017
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning.
Proceedings of the 46th International Conference on Parallel Processing, 2017
Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
2016
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters.
Parallel Comput., 2016
Proceedings of the High Performance Computing - 31st International Conference, 2016
Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits.
Proceedings of the International Conference for High Performance Computing, 2016
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications.
Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016
Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016
2015
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.
Proceedings of the High Performance Computing - 30th International Conference, 2015
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.
Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015
Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.
Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015
High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015
2014
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.
Proceedings of the Supercomputing - 29th International Conference, 2014
Proceedings of the 21st European MPI Users' Group Meeting, 2014
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014
Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters.
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the International Conference on Computing, Networking and Communications, 2014
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.
Proceedings of the 21st International Conference on High Performance Computing, 2014
2013
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand.
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the 42nd International Conference on Parallel Processing, 2013
A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems.
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
2012
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Understanding the communication characteristics in HBase: What are the fundamental bottlenecks?
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems.
Proceedings of the IEEE 20th Annual Symposium on High-Performance Interconnects, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012
Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012
2011
Proceedings of the Encyclopedia of Parallel Computing, 2011
High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT.
Comput. Sci. Res. Dev., 2011
Proceedings of the International Conference on Parallel Processing, 2011
Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL.
Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011
Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011
2010
IEEE Comput. Archit. Lett., 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2.
Proceedings of the 39th International Conference on Parallel Processing, 2010
Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters.
Proceedings of the 39th International Conference on Parallel Processing, 2010
Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010
Proceedings of the Scientific Computing with Multicore and Accelerators., 2010
2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand.
Proceedings of the ICPP 2009, 2009
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
2008
Proceedings of the 2008 International Conference on Parallel Processing, 2008