Pavan Balaji

Affiliations:
  • Argonne National Laboratory


According to our database1, Pavan Balaji authored at least 225 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
CoRR, 2024

UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023
Near-Lossless MPI Tracing and Proxy Application Autogeneration.
IEEE Trans. Parallel Distributed Syst., 2023

2021
Logically Parallel Communication for Fast MPI+Threads Applications.
IEEE Trans. Parallel Distributed Syst., 2021

Guest Editorial.
IEEE Trans. Parallel Distributed Syst., 2021

Translational research in the MPICH project.
J. Comput. Sci., 2021

Pilgrim: scalable and (near) lossless MPI tracing.
Proceedings of the International Conference for High Performance Computing, 2021

Lightweight preemptive user-level threads.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021

Daps: A Dynamic Asynchronous Progress Stealing Model for MPI Communication.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

RMACXX: An Efficient High-Level C++ Interface over MPI-3 RMA.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
Analyzing the Performance Trade-Off in Implementing User-Level Threads.
IEEE Trans. Parallel Distributed Syst., 2020

Memory-Efficient and Skew-Tolerant MapReduce Over MPI for Supercomputing Systems.
IEEE Trans. Parallel Distributed Syst., 2020

Analysis of Threading Libraries for High Performance Computing.
IEEE Trans. Computers, 2020

CAB-MPI: exploring interprocess work-stealing towards balanced MPI communication.
Proceedings of the International Conference for High Performance Computing, 2020

Implementing Flexible Threading Support in Open MPI.
Proceedings of the Workshop on Exascale MPI, 2020

How I learned to stop worrying about user-visible endpoints and love MPI.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Probing the Underlying Implementation Mechanisms of SW26010.
Proceedings of the 22nd IEEE International Conference on High Performance Computing and Communications; 18th IEEE International Conference on Smart City; 6th IEEE International Conference on Data Science and Systems, 2020

2019
Scalable Deep Learning via I/O Analysis and Optimization.
ACM Trans. Parallel Comput., 2019

Guest Editor's Introduction: P2S2: SI 2016.
Parallel Comput., 2019

International workshop on programming models and applications for multicores and manycores (PMAM 2018).
Parallel Comput., 2019

Foreword to the special issue for the Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2 2017).
Parallel Comput., 2019

Special issue on the message passing interface.
Parallel Comput., 2019

Characterization of Power Usage and Performance in Data-Intensive Applications Using MapReduce over MPI.
Proceedings of the Parallel Computing: Technology Trends, 2019

Software combining to mitigate multithreaded MPI contention.
Proceedings of the ACM International Conference on Supercomputing, 2019

Optimized Execution of Parallel Loops via User-Defined Scheduling Policies.
Proceedings of the 48th International Conference on Parallel Processing, 2019

An Auto Code Generator for Stencil on SW26010.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Dynamic Adaptable Asynchronous Progress Model for MPI RMA Multiphase Applications.
IEEE Trans. Parallel Distributed Syst., 2018

Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

Lock Contention Management in Multithreaded MPI.
ACM Trans. Parallel Comput., 2018

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models.
J. Supercomput., 2018

Introduction.
Int. J. High Perform. Comput. Appl., 2018

On the adequacy of lightweight thread approaches for high-level parallel programming models.
Future Gener. Comput. Syst., 2018

Lessons learned from analyzing dynamic promotion for user-level threading.
Proceedings of the International Conference for High Performance Computing, 2018

Characterization of MPI usage on a production supercomputer.
Proceedings of the International Conference for High Performance Computing, 2018

Scalable Communication Endpoints for MPI+Threads Applications.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

On the Power of Combiner Optimizations in MapReduce Over MPI Workflows.
Proceedings of the 24th IEEE International Conference on Parallel and Distributed Systems, 2018

Process-in-process: techniques for practical address-space sharing.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

K-mer Counting for Genomic Big Data.
Proceedings of the Big Data - BigData 2018, 2018

2017
Enabling scalable and accurate clustering of distributed ligand geometries on supercomputers.
Parallel Comput., 2017

Exploring versioned distributed arrays for resilience in scientific applications.
Int. J. High Perform. Comput. Appl., 2017

Special issue on programming models and applications for multicores and manycores.
Int. J. High Perform. Comput. Appl., 2017

Foreword to the Special Issue of the workshop on the seventh international workshop on programming models and applications for multicores and manycores (PMAM 2016).
Concurr. Comput. Pract. Exp., 2017


Memory Compression Techniques for Network Address Management in MPI.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Mimir: Memory-Efficient and Scalable MapReduce for Large Supercomputing Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

PDSEC Keynote.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

GLTO: On the Adequacy of Lightweight Thread Approaches for OpenMP Implementations.
Proceedings of the 46th International Conference on Parallel Processing, 2017

Parallel I/O Optimizations for Scalable Deep Learning.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Hexe: A Toolkit for Heterogeneous Memory Management.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Portable Topology-Aware MPI-I/O.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Bloomfish: A Highly Scalable Distributed K-mer Counting Framework.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Process-Based Asynchronous Progress Model for MPI Point-to-Point Communication.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Towards Scalable Deep Learning via I/O Analysis and Optimization.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Exploiting Common Neighborhoods to Optimize MPI Neighborhood Collectives.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017

GLT: A Unified API for Lightweight Thread Libraries.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

S-Aligner: Ultrascalable Read Mapping on Sunway Taihu Light.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

A Performance Study of UCX over InfiniBand.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Scalable Assembly for Massive Genomic Graphs.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Advanced Thread Synchronization for Multithreaded MPI Implementations.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
MPI-ACC: Accelerator-Aware MPI for Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2016

Survey of Techniques and Architectures for Designing Energy-Efficient Data Centers.
IEEE Syst. J., 2016

Special Issue on Cluster Computing.
Parallel Comput., 2016

A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC.
Parallel Comput., 2016

Special Issue on Parallel Programming Models and Systems Software for High-End Computing.
Parallel Comput., 2016

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL.
Parallel Comput., 2016

Performance analysis of data intensive cloud systems based on data management and replication: a survey.
Distributed Parallel Databases, 2016

An implementation and evaluation of the MPI 3.0 one-sided communication interface.
Concurr. Comput. Pract. Exp., 2016

Programming models and applications for multicores and manycores.
Concurr. Comput. Pract. Exp., 2016

Work stealing for GPU-accelerated parallel programs in a global address space framework.
Concurr. Comput. Pract. Exp., 2016

A survey and taxonomy on energy efficient resource allocation techniques for cloud computing systems.
Computing, 2016

Scaling FMM with Data-Driven OpenMP Tasks on Multicore Architectures.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Scalability Challenges in Current MPI One-Sided Implementations.
Proceedings of the 15th International Symposium on Parallel and Distributed Computing, 2016

SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale.
Proceedings of the 45th International Conference on Parallel Processing, 2016

One-Sided Interface for Matrix Operations Using MPI-3 RMA: A Case Study with Elemental.
Proceedings of the 45th International Conference on Parallel Processing, 2016

Compiler-Assisted Overlapping of Communication and Computation in MPI Applications.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

A Review of Lightweight Thread Approaches for High Performance Computing.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015
Scalable Network Communication Using Unreliable RDMA.
Proceedings of the Handbook on Data Centers, 2015

Remote Memory Access Programming in MPI-3.
ACM Trans. Parallel Comput., 2015

Scalable connectionless RDMA over unreliable datagrams.
Parallel Comput., 2015

Introduction Special Section of ICCCN 2014 Conference.
Comput. Commun., 2015

Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.
Proceedings of the International Conference for High Performance Computing, 2015

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery.
Proceedings of the International Conference for High Performance Computing, 2015

Fault tolerant MapReduce-MPI for HPC clusters.
Proceedings of the International Conference for High Performance Computing, 2015

MPI+Threads: runtime contention and remedies.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

HiCOMB 2015 Keynote and Invited Talks.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

AsHES Introduction and Committees.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Versioning Architectures for Local and Global Memory.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience.
Proceedings of the International Conference on Computational Science, 2015

MPI+ULT: Overlapping Communication and Computation with User-Level Threads.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Exploring the Suitability of Remote GPGPU Virtualization for the OpenACC Programming Model Using rCUDA.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Empirical Comparison of Three Versioning Architectures.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Flexible Error Recovery Using Versions in Global View Resilience.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Analyzing MPI-3.0 Process-Level Shared Memory: A Case Study with Stencil Computations.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Runtime Support for Irregular Computation in MPI-Based Applications.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Accurate Scoring of Drug Conformations at the Extreme Scale.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Techniques for Enabling Highly Efficient Message Passing on Many-Core Architectures.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Implementation and Evaluation of MPI Nonblocking Collective I/O.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Toward Implementing Robust Support for Portals 4 Networks in MPICH.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Understanding Data Access Patterns Using Object-Differentiated Memory Profiling.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

SWAP-Assembler 2: Scalable Genome Assembler towards Millions of Cores - Practice and Experience.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Lessons Learned Implementing User-Level Failure Mitigation in MPICH.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

Characterizing MPI and Hybrid MPI+Threads Applications at Scale: Case Study with BFS.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Processing MPI Derived Datatypes on Noncontiguous GPU-Resident Data.
IEEE Trans. Parallel Distributed Syst., 2014

Special issue on programming models and applications for multicores and manycores - Guest Editors' Introduction.
Parallel Comput., 2014

Addressing failures in exascale computing.
Int. J. High Perform. Comput. Appl., 2014

Enabling communication concurrency through flexible MPI endpoints.
Int. J. High Perform. Comput. Appl., 2014

SWAP-Assembler: scalable and efficient genome assembly towards thousands of cores.
BMC Bioinform., 2014

Nonblocking Epochs in MPI One-Sided Communication.
Proceedings of the International Conference for High Performance Computing, 2014

MC-Checker: Detecting Memory Consistency Errors in MPI One-Sided Applications.
Proceedings of the International Conference for High Performance Computing, 2014

Simplifying the recovery model of user-level failure mitigation.
Proceedings of the 2014 Workshop on Exascale MPI, 2014

Implementing the MPI-3.0 Fortran 2008 Binding.
Proceedings of the 21st European MPI Users' Group Meeting, 2014

Portable, MPI-interoperable coarray fortran.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

MT-MPI: multithreaded MPI for many-core environments.
Proceedings of the 2014 International Conference on Supercomputing, 2014

A Framework for Tracking Memory Accesses in Scientific Applications.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

WorkQ: A many-core producer/consumer execution model applied to PGAS computations.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Toward the efficient use of multiple explicitly managed memory subsystems.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Designing energy efficient communication runtime systems: a view from PGAS models.
J. Supercomput., 2013

Guest Editors' introduction.
J. Supercomput., 2013

A survey on resource allocation in high performance distributed computing systems.
Parallel Comput., 2013

Special issue on programming models, systems software, and tools for High-End Computing.
Parallel Comput., 2013

Guest Editors' Introduction: Special Issue on Applications for the Heterogeneous Computing Era.
Int. J. High Perform. Comput. Appl., 2013

Guest editors' introduction: Special issue on Cluster, Grid, and Cloud Computing.
Future Gener. Comput. Syst., 2013

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory.
Computing, 2013

An overview of energy efficiency techniques in cluster computing systems.
Clust. Comput., 2013

Container-Based Job Management for Fair Resource Sharing.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Analysis of topology-dependent MPI performance on Gemini networks.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Enabling MPI interoperability through flexible communication endpoints.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

Enhancing Performance Portability of MPI Applications through Annotation-Based Transformations.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

MPI-Interoperable Generalized Active Messages.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

Online Performance Projection for Clusters with Heterogeneous GPUs.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.
Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013

On the efficacy of GPU-integrated MPI for scientific applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

On the Reproducibility of MPI Reduction Operations.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

Topic 15: GPU and Accelerator Computing - (Introduction).
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Optimization Strategies for MPI-Interoperable Active Messages.
Proceedings of the IEEE 11th International Conference on Dependable, 2013

Toward Asynchronous and MPI-Interoperable Active Messages.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Applications for the Heterogeneous Computing Era.
Int. J. High Perform. Comput. Appl., 2012

Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Efficient Multithreaded Context ID Allocation in MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Efficient Intranode Communication in GPU-Accelerated Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

Supporting the Global Arrays PGAS Model Using MPI One-Sided Communication.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

ASHES Introduction.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

DMA-Assisted, Intranode Communication in GPU Accelerated Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Transparent Accelerator Migration in a Virtualized GPU Environment.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Mpi on millions of Cores.
Parallel Process. Lett., 2011

Special Issue on Programming Models, Software and Tools for High-End Computing.
Int. J. High Perform. Comput. Appl., 2011

Special Issue on Programming Models and Systems Software Support for High-End Computing Applications.
Int. J. High Perform. Comput. Appl., 2011

Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems.
Comput. Sci. Res. Dev., 2011

Poster: High-level, one-sided programming models on MPI: a case study with global arrays and NWChem.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Multi-core and Network Aware MPI Topology Functions.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Noncollective Communicator Creation in MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Dynamic Time-Variant Connection Management for PGAS Models on InfiniBand.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

RDMA Capable iWARP over Datagrams.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Building algorithmically nonstop fault tolerant MPI programs.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Energy-aware hierarchical scheduling of applications in large scale data centers.
Proceedings of the 2011 International Conference on Cloud and Service Computing, 2011

2010
A Pipelined Algorithm for Large, Irregular All-Gather Problems.
Int. J. High Perform. Comput. Appl., 2010

The Importance of Non-Data-Communication Overheads in MPI.
Int. J. High Perform. Comput. Appl., 2010

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming.
Int. J. High Perform. Comput. Appl., 2010

Global-scale distributed I/O with ParaMEDIC.
Concurr. Comput. Pract. Exp., 2010

Implementing MPI on Windows: Comparison with Common Approaches on Unix.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

PMI: A Scalable Parallel Process-Management Interface for Extreme-Scale Systems.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

A study of hardware assisted IP over InfiniBand and its impact on enterprise data center performance.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Designing High-End Computing Systems with InfiniBand and High-Speed Ethernet.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

Fault-tolerant communication runtime support for data-centric programming models.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

iWARP redefined: Scalable connectionless communication over high-speed Ethernet.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Designing Energy Efficient Communication Runtime Systems for Data Centric Programming Models.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

Power and Performance Characterization of Computational Kernels on the GPU.
Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications, 2010

Minimizing MPI Resource Contention in Multithreaded Multicore Environments.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

Hybrid parallel programming with MPI and unified parallel C.
Proceedings of the 7th Conference on Computing Frontiers, 2010

2009
ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures.
Comput. Sci. Res. Dev., 2009

Toward message passing for a million processes: characterizing MPI on a massive scale blue gene/P.
Comput. Sci. Res. Dev., 2009

Tools and Environments for Multicore and Many-Core Architectures.
Computer, 2009

MPI on a Million Processors.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

GePSeA: A General-Purpose Software Acceleration Framework for Lightweight Task Offloading.
Proceedings of the ICPP 2009, 2009

Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P.
Proceedings of the ICPP 2009, 2009

Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Tutorial: Designing High-End Computing Systems with Infiniband and 10-Gigabit Ethernet.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Tutorial: Infiniband and 10-Gigabit Ethernet for Dummies.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Natively Supporting True One-Sided Communication in.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

2008
Asymmetric interactions in symmetric multi-core systems: analysis, enhancements and evaluation.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Massively parallel genomic sequence search on the Blue Gene/P architecture.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

A Simple, Pipelined Algorithm for Large, Irregular All-gather Problems.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Non-data-communication Overheads in MPI: Analysis on Blue Gene/P.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Toward Efficient Support for Multithreaded MPI Communication.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Semantics-based distributed I/O for mpiBLAST.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Impact of Network Sharing in Multi-Core Architectures.
Proceedings of the 17th International Conference on Computer Communications and Networks, 2008

Semantic-based distributed i/o with the paramedic framework.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008

Making a Case for Proactive Flow Control in Optical Circuit-Switched Networks.
Proceedings of the High Performance Computing, 2008

Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems.
Proceedings of the High Performance Computing, 2008

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet.
Proceedings of the High Performance Computing, 2008

Are nonblocking networks really needed for high-end-computing workloads?
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

2007
Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Analyzing and Minimizing the Impact of Opportunity Cost in QoS-aware Job Scheduling.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multicore Environments.
Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects, 2007

Designing high-end computing systems with InfiniBand and10-Gigabit Ethernet iWARP.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

2006
Bridging the Ethernet-Ethernot Performance Gap.
IEEE Micro, 2006

Designing next generation data-centers with advanced communication protocols and systems services.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Asynchronous zero-copy communication for synchronous sockets in the sockets direct protocol (SDP) over InfiniBand.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks.
J. Parallel Distributed Comput., 2005

On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Performance Characterization of a 10-Gigabit Ethernet TOE.
Proceedings of the 13th Annual IEEE Symposium on High Performance Interconnects (HOTIC 2005), 2005

Supporting iWARP Compatibility and Features for Regular Network Adapters.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines.
Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand.
Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005

2004
Sockets Direct Protocol over InfiniBand in clusters: is it beneficial?
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Towards provision of quality of service guarantees in job scheduling.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
QoPS: A QoS Based Scheme for Parallel Job Scheduling.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003

Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Impact of High Performance Sockets on Data Intensive Applications.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

2002
High Performance User Level Sockets over Gigabit Ethernet.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002


  Loading...