Dhabaleswar K. Panda

Karthik Vadambacheri Manian

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures.

[BibT_eX]

[DOI]

Bharath Ramesh

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems.

[BibT_eX]

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications.

[BibT_eX]

[DOI]

Aamir Shafi

Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020

Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters.

[BibT_eX]

[DOI]

Qinghua Zhou

Proceedings of the IEEE International Conference on Cluster Computing, 2020

Design and Characterization of InfiniBand Hardware Tag Matching in MPI.

[BibT_eX]

[DOI]

Seyedeh Mahdieh Ghazimirsaeed

Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019

Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Efficient design for MPI asynchronous progress without dedicated resources.

[BibT_eX]

[DOI]

Amit Ruhela

Parallel Comput., 2019

Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2?

[BibT_eX]

[DOI]

Karthik Vadambacheri Manian

Parallel Comput., 2019

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow.

[BibT_eX]

[DOI]

CoRR, 2019

CCF THPC inaugural issue editorial.

[BibT_eX]

[DOI]

Depai Qian

CCF Trans. High Perform. Comput., 2019

Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2019

Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019

Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, 2019

Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera.

[BibT_eX]

[DOI]

Proceedings of the Third IEEE/ACM Workshop on Deep Learning on Supercomputers, 2019

High performance distributed deep learning: a beginner's guide.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Introduction to HPBDC 2019.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

SCOR-KV: SIMD-Aware Client-Centric and Optimistic RDMA-Based Key-Value Store for Emerging CPU Architectures.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters.

[BibT_eX]

[DOI]

Bharath Ramesh

Kaushik Kandadi Suresh

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures.

[BibT_eX]

[DOI]

Karthik Vadambacheri Manian

Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019

Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures.

[BibT_eX]

[DOI]

Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019

2018

DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Multi Scale Comput. Syst., 2018

MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Networking and communication challenges for post-exascale systems.

[BibT_eX]

[DOI]

Frontiers Inf. Technol. Electron. Eng., 2018

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences.

[BibT_eX]

[DOI]

Rajarshi Biswas

CoRR, 2018

Analyzing, Modeling, and Provisioning QoS for NVMe SSDs.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE/ACM International Conference on Utility and Cloud Computing, 2018

Cooperative rendezvous protocols for improved performance and overlap.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Efficient Asynchronous Communication Progress for MPI without Dedicated Resources.

[BibT_eX]

[DOI]

Amit Ruhela

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures.

[BibT_eX]

[DOI]

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

[BibT_eX]

[DOI]

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Introduction to HPBDC 2018.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Accelerating TensorFlow with Adaptive RDMA-Based gRPC.

[BibT_eX]

[DOI]

Rajarshi Biswas

Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018

Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing.

[BibT_eX]

[DOI]

M. Haseeb Javed

Proceedings of the IEEE International Conference on Cluster Computing, 2018

SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, 2018

Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks*.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures.

[BibT_eX]

[DOI]

Haiyang Shi

Proceedings of the Benchmarking, Measuring, and Optimizing, 2018

2017

A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached.

[BibT_eX]

[DOI]

IEEE Data Eng. Bull., 2017

Stampede 2: The Evolution of an XSEDE Supercomputer.

[BibT_eX]

[DOI]

Proceedings of the Practice and Experience in Advanced Research Computing 2017: Sustainability, 2017

Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2017

Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds?

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Scalable reduction collectives with data partitioning-based multi-leader design.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning on HPC Environments, 2017

MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU.

[BibT_eX]

[DOI]

Proceedings of the 24th European MPI Users' Group Meeting, 2017

S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Exploiting and Evaluating OpenSHMEM on KNL Architecture.

[BibT_eX]

[DOI]

Mingzhe Li

Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Introduction to HPBDC Workshop.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling.

[BibT_eX]

[DOI]

Proceedings of the 46th International Conference on Parallel Processing, 2017

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning.

[BibT_eX]

[DOI]

Bracy Elton

Proceedings of the 46th International Conference on Parallel Processing, 2017

High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Performance characterization and acceleration of big data workloads on OpenPOWER system.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Characterizing and accelerating indexing techniques on distributed ordered tables.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka.

[BibT_eX]

[DOI]

M. Haseeb Javed

Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, 2017

Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach.

[BibT_eX]

[DOI]

Proceedings of the Research Advances in Cloud Computing, 2017

2016

Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters.

[BibT_eX]

[DOI]

J. Supercomput., 2016

CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet.

[BibT_eX]

[DOI]

Proceedings of the XSEDE16 Conference on Diversity, 2016

INAM2: InfiniBand Network Analysis and Monitoring with MPI.

[BibT_eX]

[DOI]

Albert Mathews Augustine

Proceedings of the High Performance Computing - 31st International Conference, 2016

Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters?

[BibT_eX]

[DOI]

Proceedings of the 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems, 2016

Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the 2016 PGAS Applications Workshop, 2016

Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications.

[BibT_eX]

[DOI]

Proceedings of the First International Workshop on Communication Optimizations in HPC, 2016

MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Designing high performance communication runtime for GPU managed memory: early experiences.

[BibT_eX]

[DOI]

Dip Sankar Banerjee

Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

Performance Characterization of Hypervisor-and Container-Based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

HPBDC Introduction and Committees.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 45th International Conference on Parallel Processing, 2016

System-Level Scalable Checkpoint-Restart for Petascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications; 14th IEEE International Conference on Smart City; 2nd IEEE International Conference on Data Science and Systems, 2016

Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016

Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing, 2016

Adaptive and Dynamic Design for MPI Tag Matching.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters.

[BibT_eX]

[DOI]

Dip Sankar Banerjee

Proceedings of the 2016 IEEE International Conference on Cloud Computing Technology and Science, 2016

CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

SHMEMPMI - Shared Memory Based PMI for Improved Performance and Scalability.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

High-performance design of apache spark with RDMA and its benefits on various workloads.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Performance characterization of hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, 2016

2015

Accelerating Big Data Processing on Modern Clusters.

[BibT_eX]

[DOI]

Proceedings of the 1st Workshop on Performance Analysis of Big Data Systems, 2015

Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - 30th International Conference, 2015

A case for application-oblivious energy-efficient MPI runtime.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Scalable Out-of-core OpenSHMEM Library for HPC.

[BibT_eX]

[DOI]

Antonio Gómez-Iglesias

Christopher S. Simmons

William L. Barth

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Can RDMA benefit online data processing workloads on memcached and MySQL?

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA.

[BibT_eX]

[DOI]

Nusrat Sharmin Islam

Raghunath Raja Chandrasekar

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on High Performance Computing, 2015

High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing, 2015

High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Non-Blocking PMI Extensions for Fast MPI Startup.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS.

[BibT_eX]

[DOI]

Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2015

Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

2014

GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks.

[BibT_eX]

[DOI]

Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2014

Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 29th International Conference, 2014

Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

PMI Extensions for Scalable MPI Startup.

[BibT_eX]

[DOI]

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Scalable MiniMD Design with Hybrid MPI and OpenSHMEM.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing Collective Communication in UPC.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

Performance Modeling for RDMA-Enhanced Hadoop MapReduce.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Message from the general co-chairs IEEE ICPADS 2014.

[BibT_eX]

[DOI]

Jang-Ping Sheu

Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Wide-area overlay networking to manage science DMZ accelerated flows.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computing, Networking and Communications, 2014

MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Accelerating Spark with RDMA for Big Data Processing: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE Annual Symposium on High-Performance Interconnects, 2014

High performance MPI library over SR-IOV enabled infiniband clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters?

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

MapReduce over Lustre: Can RDMA-Based Approach Benefit?

[BibT_eX]

[DOI]

Nusrat Sharmin Islam

Proceedings of the Euro-Par 2014 Parallel Processing, 2014

Scalable Graph500 design with MPI-3 RMA.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

In-memory I/O and replication for HDFS with Memcached: Early experiences.

[BibT_eX]

[DOI]

Nusrat Sharmin Islam

Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

2013

Redesigning MPI shared memory communication for large multi-core architecture.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2013

A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks.

[BibT_eX]

[DOI]

Proceedings of the Advancing Big Data Benchmarks, 2013

Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models.

[BibT_eX]

[DOI]

Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

MetaData persistence using storage class memory: experiences with flash-backed DRAM.

[BibT_eX]

[DOI]

Proceedings of the 1st Workshop on Interactions of NVM/FLASH with Operating Systems and Workloads, 2013

MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Efficient and truly passive MPI-3 RMA using InfiniBand atomics.

[BibT_eX]

[DOI]

Proceedings of the 20th European MPI Users's Group Meeting, 2013

High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Extending OpenSHMEM for GPU Computing.

[BibT_eX]

[DOI]

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2013

Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

High-Performance Design of Hadoop RPC with RDMA over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems.

[BibT_eX]

[DOI]

Proceedings of the 42nd International Conference on Parallel Processing, 2013

A 1 PB/s file system to checkpoint three million MPI tasks.

[BibT_eX]

[DOI]

Adam Moody

Kathryn M. Mohror

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Tutorials.

[BibT_eX]

[DOI]

Proceedings of the IEEE 21st Annual Symposium on High-Performance Interconnects, 2013

Design of network topology aware scheduling services for large InfiniBand clusters.

[BibT_eX]

[DOI]

Devendar Bureddy

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

[BibT_eX]

[DOI]

Proceedings of the ACM Symposium on Cloud Computing, SOCC '13, 2013

Efficient Intra-node Communication on Intel-MIC Clusters.

[BibT_eX]

[DOI]

Devendar Bureddy

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience.

[BibT_eX]

[DOI]

Mingzhe Li

Mark Daniel Arnold

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters.

[BibT_eX]

[DOI]

Proceedings of the Specifying Big Data Benchmarks, 2012

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

High performance RDMA-based design of HDFS over InfiniBand.

[BibT_eX]

[DOI]

Nusrat S. Islam

Proceedings of the SC Conference on High Performance Computing Networking, 2012

OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Understanding the communication characteristics in HBase: What are the fundamental bottlenecks?

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI.

[BibT_eX]

[DOI]

Xavier Besseron

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters.

[BibT_eX]

[DOI]

S. Pai Raikar

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers.

[BibT_eX]

[DOI]

Bronis R. de Supinski

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

High-Performance Design of HBase with RDMA over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Congestion avoidance on manycore high performance computing systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks.

[BibT_eX]

[DOI]

Nusrat S. Islam

Proceedings of the 41st International Conference on Parallel Processing, 2012

Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing, 2012

Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE 20th Annual Symposium on High-Performance Interconnects, 2012

A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework.

[BibT_eX]

[DOI]

Jai Jaswani

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms?

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012

Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports.

[BibT_eX]

[DOI]

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011

Collective Communication, Network Support For.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2011

High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2011

Codesign for InfiniBand Clusters.

[BibT_eX]

[DOI]

Karen Tomko

Computer, 2011

Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Memcached Design on High Performance RDMA Capable Interconnects.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Beyond block I/O: Rethinking traditional storage primitives.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL.

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Multi-threaded UPC runtime with network endpoints: Design alternatives and evaluation on multi-core architectures.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging?

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool.

[BibT_eX]

[DOI]

N. Dandapanthula

Ron Brightwell

Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit.

[BibT_eX]

[DOI]

Ashish Kumar Singh

Hao Wang

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems?

[BibT_eX]

[DOI]

Vilobh Meshram

Xavier Besseron

Ravi Prakash

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

High Performance Pipelined Process Migration with RDMA.

[BibT_eX]

[DOI]

Xavier Besseron

Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010

Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2010

Scalable Earthquake Simulation on Petascale Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2010

Unifying UPC and MPI runtimes: experience with MVAPICH.

[BibT_eX]

[DOI]

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, 2010

Designing high-performance and resilient message passing on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2.

[BibT_eX]

[DOI]

Emilio Pasquale Mancini

Proceedings of the 39th International Conference on Parallel Processing, 2010

Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters.

[BibT_eX]

[DOI]

Emilio Pasquale Mancini

Proceedings of the 39th International Conference on Parallel Processing, 2010

Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine.

[BibT_eX]

[DOI]

Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

Designing High-End Computing Systems with InfiniBand and High-Speed Ethernet.

[BibT_eX]

[DOI]

Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

RDMA-Based Job Migration Framework for MPI over InfiniBand.

[BibT_eX]

[DOI]

Sonya Marcarelli

Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

An MPI-Stream Hybrid Programming Model for Computational Clusters.

[BibT_eX]

[DOI]

Emilio Pasquale Mancini

Gregory Marsh

Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009

IPDPS 2007: Comments from the Guest Editor.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2009

Topology agnostic hot-spot avoidance with InfiniBand.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Impact of Node Level Caching in MPI Job Launch Mechanisms.

[BibT_eX]

[DOI]

Jaidev K. Sridhar

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand.

[BibT_eX]

[DOI]

Jaidev K. Sridhar

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Designing multi-leader-based Allgather algorithms for multi-core clusters.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Designing and Evaluating MPI-2 Dynamic Process Management Support for InfiniBand.

[BibT_eX]

[DOI]

Tejus Gangadharappa

Proceedings of the ICPPW 2009, 2009

Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems.

[BibT_eX]

[DOI]

Karthik Gopalakrishnan

Proceedings of the ICPP 2009, 2009

Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2009, 2009

CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems.

[BibT_eX]

[DOI]

Proceedings of the ICPP 2009, 2009

Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Tutorial: Designing High-End Computing Systems with Infiniband and 10-Gigabit Ethernet.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Tutorial: Infiniband and 10-Gigabit Ethernet for Dummies.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture.

[BibT_eX]

[DOI]

Karthik Gopalakrishnan

Tejus Gangadharappa

Proceedings of the 16th International Conference on High Performance Computing, 2009

An efficient hardware-software approach to network fault tolerance with InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

RDMA over Ethernet - A preliminary study.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Reducing network contention with mixed workloads on modern multicore, clusters.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Natively Supporting True One-Sided Communication in.

[BibT_eX]

[DOI]

Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

2008

Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication.

[BibT_eX]

[DOI]

Rahul Kumar

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Designing passive synchronization for MPI-2 one-sided communication to maximize overlap.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Scaling alltoall collective on multi-core systems.

[BibT_eX]

[DOI]

Rahul Kumar

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand.

[BibT_eX]

[DOI]

Terry R. Jones

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband.

[BibT_eX]

[DOI]

Rahul Kumar

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Performance of HPC Middleware over InfiniBand WAN.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand.

[BibT_eX]

[DOI]

Karthik Gopalakrishnan

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

ScELA: Scalable and Extensible Launching Architecture for Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2008

Designing a High-Performance Clustered NAS: A Case Study with pNFS over RDMA on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2008

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2008

Designing next generation clusters with InfiniBand and 10GE/iWARP: Opportunities and challenges.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Scalable MPI design over InfiniBand using eXtended Reliable Connection.

[BibT_eX]

[DOI]

Jaidev K. Sridhar

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Efficient one-copy MPI shared memory communication in Virtual Machines.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

Advanced RDMA-Based Admission Control for Modern Data-Centers.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007

Nomad: migrating OS-bypass networks in virtual machines.

[BibT_eX]

[DOI]

Proceedings of the 3rd International Conference on Virtual Execution Environments, 2007

Virtual machine aware communication libraries for high performance computing.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements.

[BibT_eX]

[DOI]

Feng Qin

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

pNFS/PVFS2 over InfiniBand: early experiences.

[BibT_eX]

[DOI]

Proceedings of the 2nd International Petascale Data Storage Workshop (PDSW '07), 2007

Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact.

[BibT_eX]

[DOI]

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Benefits of I/O Acceleration Technology (I/OAT) in Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Automatic Path Migration over InfiniBand: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High Performance MPI on IBM 12x InfiniBand Architecture.

[BibT_eX]

[DOI]

Brad Benton

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters.

[BibT_eX]

[DOI]

Proceedings of the 21th Annual International Conference on Supercomputing, 2007

Designing NFS with RDMA for Security, Performance and Scalability.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

High Performance MPI over iWARP: Early Experiences.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

Efficient asynchronous memory copy operations on multi-core systems and I/OAT.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Designing high-end computing systems with InfiniBand and10-Gigabit Ethernet iWARP.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Zero-copy protocol for MPI using infiniband unreliable datagram.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

High performance virtual machine migration with RDMA over modern interconnects.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations.

[BibT_eX]

[DOI]

A. Marnidala

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach.

[BibT_eX]

[DOI]

Terry R. Jones

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006

Bridging the Ethernet-Ethernot Performance Gap.

[BibT_eX]

[DOI]

Wu-chun Feng

IEEE Micro, 2006

NIC-based reduction algorithms for large-scale clusters.

[BibT_eX]

[DOI]

Fabrizio Petrini

Adam Moody

Juan Fernández Peinador

Eitan Frachtenberg

Int. J. High Perform. Comput. Netw., 2006

High Performance Remote Memory Access Communication: The Armci Approach.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2006

High Performance VMM-Bypass I/O in Virtual Machines.

[BibT_eX]

[DOI]

Proceedings of the 2006 USENIX Annual Technical Conference, 2006

Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

MPI and communication - High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Panel: Data intensive computing.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Benefits of high speed interconnects to cluster file systems: a case study with Lustre.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Adaptive connection management for scalable MPI over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Shared receive queue based scalable MPI design for InfiniBand clusters.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Designing next generation data-centers with advanced communication protocols and systems services.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Asynchronous zero-copy communication for synchronous sockets in the sockets direct protocol (SDP) over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A case for high performance computing with virtual machines.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA.

[BibT_eX]

[DOI]

Shuang Liang

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

NemC: A Network Emulator for Cluster-of-Clusters.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference On Computer Communications and Networks, 2006

Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE Symposium on High-Performance Interconnects, 2006

DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2006

Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters.

[BibT_eX]

[DOI]

Albert Hartono

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Design of High Performance MVAPICH2: MPI2 over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005

Evaluating InfiniBand Performance with PCI Express.

[BibT_eX]

[DOI]

IEEE Micro, 2005

Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2005

Selective preemption strategies for parallel job scheduling.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2005

High Performance Broadcast Support in La-Mpi Over Quadrics.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2005

Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2005

Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Design Alternatives and Performance Trade-Offs for Implementing MPI-2 over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Design and Implementation of Open MPI over Quadrics/Elan4.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Scheduling of MPI-2 One Sided Operations over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

High performance support of parallel virtual file system (PVFS2) over Quadrics.

[BibT_eX]

[DOI]

Shuang Liang

Proceedings of the 19th Annual International Conference on Supercomputing, 2005

LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Performance Characterization of a 10-Gigabit Ethernet TOE.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2005

High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2005

Performance Evaluation of MM5 on Clusters with Modern Interconnects: Scalability and Impact.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device.

[BibT_eX]

[DOI]

Shuang Liang

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Supporting iWARP Compatibility and Features for Regular Network Adapters.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Can high performance software DSM systems designed with InfiniBand features benefit from PCI-Express?

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005

Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005

2004

Microbenchmark Performance Comparison of High-Speed Cluster Interconnects.

[BibT_eX]

[DOI]

IEEE Micro, 2004

High Performance RDMA-Based MPI Implementation over InfiniBand.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2004

Application-bypass reduction for large-scale clusters.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2004

Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2004

Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Zero-Copy MPI Derived Datatype Communication over InfiniBand.

[BibT_eX]

[DOI]

Dhabaleswar Wu

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?

[BibT_eX]

[DOI]

Savitha Krishnamoorthy

Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

High Performance Implementation of MPI Derived Datatype Communication over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Host-Assisted Zero-Copy Remote Memory Access Communication on InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Fast and Scalable MPI-Level Broadcast Using InfiniBand?s Hardware Multicast Support.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Design and Implementation of MPICH2 over InfiniBand with RDMA Support.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Applying MPI Derived Datatypes to the NAS Benchmarks: A Case Study.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), 2004

Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Performance evaluation of InfiniBand with PCI Express.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

Fast and Scalable Startup of MPI Programs in InfiniBand Clusters.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2004

Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

NIC-based offload of dynamic user-defined modules for Myrinet clusters.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

State of InfiniBand in designing HPC clusters, storage/file systems, and datacenters [datacenters read as data centers].

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

Towards provision of quality of service guarantees in job scheduling.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

Designing high performance DSM systems using InfiniBand features.

[BibT_eX]

[DOI]

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

High performance MPI-2 one-sided communication over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 4th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2004), 2004

2003

Demotion-based exclusive caching through demote buffering: design and evaluations over different networks.

[BibT_eX]

[DOI]

Proceedings of the International Workshop on Storage Network Architecture and Parallel I/Os, 2003

Scalable NIC-based Reduction on Large-scale Clusters.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003

Towards NIC-based intrusion detection.

[BibT_eX]

[DOI]

Matthew Eric Otey

Srinivasan Parthasarathy

Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24, 2003

QoPS: A QoS Based Scheme for Parallel Job Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003

Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Optimizing Synchronization Operations for Remote Memory Communication Systems.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

High performance RDMA-based MPI implementation over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual International Conference on Supercomputing, 2003

High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

PVFS over InfiniBand: Design and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

QoS-Aware Middleware for Cluster-Based Servers to support Interactive and Resource-Adaptive Applications.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

Impact of High Performance Sockets on Data Intensive Applications.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

Micro-benchmark level performance comparison of high-speed cluster interconnects.

[BibT_eX]

[DOI]

Balasubramanian Chandrasekaran

Proceedings of the 11th Annual IEEE Symposium on High Performance Interconnects, 2003

Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

MIBA: A Micro-Benchmark Suite for Evaluating InfiniBand Architecture Implementations.

[BibT_eX]

[DOI]

B. Chandrasekaran

Proceedings of the Computer Performance Evaluations, 2003

Supporting Efficient Noncontiguous Access in PVFS over InfiniBand.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

Designing Next Generation Clusters with Infiniband: Opportunities and Challenges.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003

Application-Bypas Broadcast in MPICH over GM.

[BibT_eX]

[DOI]

Ron Brightwell

Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002

HIPIQS: A High-Performance Switch Architecture Using Input Queuing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2002

Feature estimation for efficient streaming.

[BibT_eX]

[DOI]

Naveen Kumar Polapally

Raghu Machiraju

Proceedings of the IEEE/SIGGRAPH Symposium on Volume Visualization and Graphics, 2002

Active Network Interface: Opportunities and Challenges.

[BibT_eX]

[DOI]

Proceedings of the 27th Annual IEEE Conference on Local Computer Networks (LCN 2002), 2002

MPI/IO on DAFS over VIA: Implementation and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Can User-Level Protocols Take Advantage of Multi-CPU NICs?.

[BibT_eX]

[DOI]

Piyush Shivam

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Workshop Introduction.

[BibT_eX]

[DOI]

José Duato

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters.

[BibT_eX]

[DOI]

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

A Reliable Multicast Algorithm for Mobile Ad Hoc Networks.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02), 2002

Tutorial 2: InfiniBand Architecture and Where it is Headed.

[BibT_eX]

[DOI]

Doddaballapur Narasimha-Murthy Jayasimha

Proceedings of the 10th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2002), August 21, 2002

Impact of On-Demand Connection Management in MPI over VIA.

[BibT_eX]

[DOI]

Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

High Performance User Level Sockets over Gigabit Ethernet.

[BibT_eX]

[DOI]

Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001

Hybrid Algorithms for Complete Exchange in 2D Meshes.

[BibT_eX]

[DOI]

N. S. Sundar

IEEE Trans. Parallel Distributed Syst., 2001

Architectural Support for Efficient Multicasting in Irregular Networks.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2001

Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2001

MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2001

Design Alternatives for Virtual Interface Architecture and an Implementation on IBM Netfinity NT Cluster.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2001

Adaptive Routing on the New Switch Chip for IBM SP Systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2001

EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing.

[BibT_eX]

[DOI]

Piyush Shivam

Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations.

[BibT_eX]

[DOI]

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Performance Benefits of NIC-Based Barrier on Myrinet/GM.

[BibT_eX]

[DOI]

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Fast NIC-Based Barrier over Myrinet/GM.

[BibT_eX]

[DOI]

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations.

[BibT_eX]

[DOI]

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

NIC-Based Rate Control for Proportional Bandwidth Allocation in Myrinet Clusters.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Processing, 2001

Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Processing, 2001

2000

Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2000

Adaptive Routing in RS/6000 SP-Like Bidirectional Multistage Interconnection Networks.

[BibT_eX]

[DOI]

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Efficient Virtual Interface Architecture (VIA) Support for the IBM SP Switch-Connected NT Clusters.

[BibT_eX]

[DOI]

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Characterization and Enhancement of Dynamic Mapping Heuristics for Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Workshop on Parallel Processing, 2000

Balancing Web Server Load for Adaptable Video Distribution.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Workshop on Parallel Processing, 2000

Characterization and enhancement of Static Mapping Heuristics for Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2000

Can Scatter Communication Take Advantage of Multidestination Message Passing?

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2000

Fast Collective Communication Algorithms for Reflective Memory Network Clusters.

[BibT_eX]

[DOI]

Vijay Moorthy

Proceedings of the Network-Based Parallel Computing: Communication, 2000

Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages.

[BibT_eX]

[DOI]

Proceedings of the Network-Based Parallel Computing: Communication, 2000

Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA).

[BibT_eX]

[DOI]

Bülent Abali

Proceedings of the Network-Based Parallel Computing: Communication, 2000

1999

Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths.

[BibT_eX]

[DOI]

Sanjay Singal

IEEE Trans. Parallel Distributed Syst., 1999

Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1999

Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1999

Low-Latency Message Passing on Workstation Clusters using SCRAMNet.

[BibT_eX]

[DOI]

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

All-to-All Broadcast on Switch-Based Clusters of Workstations.

[BibT_eX]

[DOI]

Matthew G. Jacunski

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Implementing Efficient MPI on LAPI for IBM RS/6000 SP Systems: Experiences and Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

Communication Modeling of Heterogeneous Networks of Workstations for Performance Characterization of Collective Operations.

[BibT_eX]

[DOI]

Jayanthi Sampathkumar

Sandeep Prabhu

Proceedings of the 8th Heterogeneous Computing Workshop, 1999

Low Latency Message-Passing for Reflective Memory Networks.

[BibT_eX]

[DOI]

Proceedings of the Network-Based Parallel Computing: Communication, 1999

1998

Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1998

Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1998

Designing communication strategies for heterogeneous parallel systems.

[BibT_eX]

[DOI]

Ravi Prakash

Parallel Comput., 1998

Experiences with Software MPEG-2 Video Decompression on an SMP PC.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing Workshops, 1998

Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch?

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Efficient Collective Communication on Heterogeneous Networks of Workstations.

[BibT_eX]

[DOI]

Vijay Moorthy

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

1997

Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1997

Special Issue on Workstation Clusters and Network-Based Computing: Guest Editors' Introduction.

[BibT_eX]

[DOI]

Lionel M. Ni

J. Parallel Distributed Comput., 1997

Simulation of Modern Parallel Systems: A CSIM-based Approach.

[BibT_eX]

[DOI]

Proceedings of the 29th conference on Winter simulation, 1997

Multicasting in Irregular Networks with Cut-Through Switches Using Tree-Based Multidestination Worms.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computer Routing and Communication, 1997

Designing High-Performance Communication Subsystems: Top Five Problems to Solve and Five Problems Not to Solve During the Next Five Years (Panel).

[BibT_eX]

[DOI]

Proceedings of the Parallel Computer Routing and Communication, 1997

Multicasting on Switch-Based Irregular Networks Using Multi-drop Path-Based Multidestination Worms.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computer Routing and Communication, 1997

How Can We Design Better Networks for DSM Systems?

[BibT_eX]

[DOI]

Proceedings of the Parallel Computer Routing and Communication, 1997

A Reliable Hardware Barrier Synchronization Scheme.

[BibT_eX]

[DOI]

Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Optimal Multicast with Packetization and Network Interface Support.

[BibT_eX]

[DOI]

Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

How Much Does Network Contention Affect Distributed Shared Memory Performance?

[BibT_eX]

[DOI]

Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Multicast on Irregular Switch-Based Networks with Wormhole Routing.

[BibT_eX]

[DOI]

Kiran Bondalapati

Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture (HPCA '97), 1997

Prioritized demand multiplexing (PDM): a low-latency virtual channel flow control framework for prioritized traffic.

[BibT_eX]

[DOI]

Abdel-Halim Smai

Lars-Erik Thorelli

Proceedings of the Fourth International on High-Performance Computing, 1997

1996

A Trip-Based Multicasting Model in Wormhole-Routed Networks with Virtual Channels.

[BibT_eX]

[DOI]

Yu-Chee Tseng

Ten-Hwang Lai

IEEE Trans. Parallel Distributed Syst., 1996

Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1996

Benefits of Processor Clustering in Designing Large Parallel Systems: When and How?

[BibT_eX]

[DOI]

Doddaballapur Narasimha-Murthy Jayasimha

Proceedings of IPPS '96, 1996

Hybrid Algorithms for Complete Exchange in 2D Meshes.

[BibT_eX]

[DOI]

N. S. Sundar

Proceedings of the 10th international conference on Supercomputing, 1996

Minimizing Node Contention in Multiple Multicast on Wormhole k-ary N-Cube Networks.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing, 1996

Reducing Cache Invalidation Overheads in Wormhole Routed DSMs Using Multidestination Message Passing.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing, 1996

Designing Processor-Cluster Based Systems: Interplay Between Organizations and Broadcasting Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 1996 International Conference on Parallel Processing, 1996

1995

Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 1995

An efficient scheme for complete exchange in 2D tori.

[BibT_eX]

[DOI]

Yu-Chee Tseng

Sandeep K. S. Gupta

Proceedings of IPPS '95, 1995

Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms.

[BibT_eX]

[DOI]

Proceedings of IPPS '95, 1995

1994

Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme.

[BibT_eX]

[DOI]

Sanjay Singal

Pradeep Prabhakaran

Proceedings of the Parallel Computer Routing and Communication, 1994

Architectural issues in designing heterogeneous parallel systems with passive star-coupled optical interconnection.

[BibT_eX]

[DOI]

Ravi Prakash

Proceedings of the International Symposium on Parallel Architectures, 1994

Clustering and Intra-Processor Scheduling for Explicitly-Parallel Programs on Distributed-Memory Systems.

[BibT_eX]

[DOI]

Vibha A. Dixit-Radiya

Proceedings of the 8th International Symposium on Parallel Processing, 1994

Designing Large Hierarchical Multiprocessor Systems under Processor, Interconnection, and Packaging Advancements.

[BibT_eX]

[DOI]

Proceedings of the 1994 International Conference on Parallel Processing, 1994

1993

Task Assignment on Distributed-Memory Systems with Adaptive Wormhole Routing.

[BibT_eX]

[DOI]

Vibha A. Dixit-Radiya

Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

Scalable Architectures with k-ary n-Cube Cluster-c organization.

[BibT_eX]

[DOI]

Proceedings of the Fifth IEEE Symposium on Parallel and Distributed Processing, 1993

A Trip-Based Multicasting Model for Wormhole-Routed Networks with Virtual Channels.

[BibT_eX]

[DOI]

Yu-Chee Tseng

Proceedings of the Seventh International Parallel Processing Symposium, 1993

Barrier Synchronization in Distributed-Memory Multiprocessing Using Rendezvous Primitives.

[BibT_eX]

[DOI]

Sandeep K. S. Gupta

Proceedings of the Seventh International Parallel Processing Symposium, 1993

Impact of Multiple Consumption Channels on Wormhole Routed k-ary n-cube Networks.

[BibT_eX]

[DOI]

Shobana Balakrishnan

Proceedings of the Seventh International Parallel Processing Symposium, 1993

1991

Fast Data Manipulation in Multiprocessors Using Parallel Pipelined Memories.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1991

Architectural Design of Orthogonal Multiprocessor for Multidimensional Information Processing.

[BibT_eX]

[DOI]

J. Inf. Sci. Eng., 1991

Message Vectorization for Converting Multicomputer Programs to Shared-Memory Multiprocessors.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1991

1990

OMP: a RISC-based multiprocessor using orthogonal-access memories and multiple spanning buses.

[BibT_eX]

[DOI]

Proceedings of the 4th international conference on Supercomputing, 1990

Algorithm-Driven Simulation and Performance Projection of a RISC-based Orthogonal Multiprocessor.

[BibT_eX]

Proceedings of the 1990 International Conference on Parallel Processing, 1990

Reconfigurable vector register windows for fast matrix computation on the orthogonal multiprocessor.

[BibT_eX]

[DOI]

Dhabaleswar Kumar Panda

Proceedings of the Application Specific Array Processors, 1990

1989

Optical arithmetic using high-radix symbolic substitution rules.

[BibT_eX]

[DOI]

Proceedings of the 9th Symposium on Computer Arithmetic, 1989

1988

A Parallel-Serial Binary Arbitration Scheme for Collision-Free Multi-Access Techniques.

[BibT_eX]

[DOI]