George Bosilca

Manjunath Gorentla Venkata

Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

GPU-Aware Non-contiguous Data Movement In Open MPI.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Exploiting a Parametrized Task Graph Model for the Parallelization of a Sparse Direct Multifrontal Solver.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

DSN 2016 Tutorial: Resilience for Scientific Computing: From Theory to Practice.

[BibT_eX]

[DOI]

Franck Cappello

Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2016

2015

Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2015

Composing resilience techniques: ABFT, periodic and incremental checkpointing.

[BibT_eX]

[DOI]

Int. J. Netw. Comput., 2015

Practical scalable consensus for pseudo-synchronous distributed systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Sliding Substitution of Failed Nodes.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Plan B: Interruption of Ongoing MPI Operations to Support Failure Recovery.

[BibT_eX]

[DOI]

Proceedings of the 22nd European MPI Users' Group Meeting, 2015

Accelerating NWChem Coupled Cluster Through Dataflow-Based Execution.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2015

From MPI to OpenSHMEM: Porting LAMMPS.

[BibT_eX]

[DOI]

Chunyan Tang

Manjunath Gorentla Venkata

Thomas Hérault

Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Hierarchical DAG Scheduling for Hybrid Distributed Systems.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Design for a Soft Error Resilient Dynamic Task-Based Runtime.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

UCX: An Open Source Framework for HPC Network APIs and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE Annual Symposium on High-Performance Interconnects, 2015

PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems.

[BibT_eX]

[DOI]

Parallel Comput., 2014

Power profiling of Cholesky and QR factorizations on distributed memory systems.

[BibT_eX]

[DOI]

Hatem Ltaief

Horacio Emilio Pérez Sánchez

Comput. Sci. Res. Dev., 2014

Unified model for assessing checkpointing protocols at extreme-scale.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2014

PTG: an abstraction for unhindered parallelism.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Optimizations to enhance sustainability of MPI applications.

[BibT_eX]

[DOI]

Alexey L. Lastovetsky

Christi Symeonidou

José M. Cecilia

Proceedings of the 21st European MPI Users' Group Meeting, 2014

A Multithreaded Communication Substrate for OpenSHMEM.

[BibT_eX]

[DOI]

Thomas Hérault

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Taking Advantage of Hybrid Systems for Sparse Direct Solvers via Task-Based Runtimes.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Assessing the Impact of ABFT and Checkpoint Composite Strategies.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Task-Based Programming for Seismic Imaging: Preliminary Results.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems.

[BibT_eX]

[DOI]

Damien Genet

Abdou Guermouche

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Utilizing dataflow-based execution for coupled cluster methods.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

Post-failure recovery of MPI communication capability: Design and rationale.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

PaRSEC: Exploiting Heterogeneity to Enhance Scalability.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2013

Correlated set coordination in fault tolerant message logging protocols for many-core clusters.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

An evaluation of User-Level Failure Mitigation support in MPI.

[BibT_eX]

[DOI]

Computing, 2013

CPU-GPU hybrid bidiagonal reduction with soft error resilience.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2013

Parallel reduction to hessenberg form with algorithm-based fault tolerance.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, 2013

2012

DAGuE: A generic distributed DAG engine for High Performance Computing.

[BibT_eX]

[DOI]

Parallel Comput., 2012

Poster: Matrices over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Matrices Over Runtime Systems at Exascale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Algorithm-based fault tolerance for dense matrix factorizations.

[BibT_eX]

[DOI]

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.

[BibT_eX]

[DOI]

Proceedings of the Transition of HPC Towards Exascale Computing, 2012

From Serial Loops to Parallel Execution on Distributed Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011

Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

OMPIO: A Modular Software Architecture for MPI I/O.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Will MPI Remain Relevant?

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Kernel Assisted Collective Intra-node MPI Communication among Multi-Core and Many-Core CPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

The Common Communication Interface (CCI).

[BibT_eX]

[DOI]

Proceedings of the IEEE 19th Annual Symposium on High Performance Interconnects, 2011

Correlated Set Coordination in Fault Tolerant Message Logging Protocols.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar 2011).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Process Distance-Aware Adaptive MPI Collective Communications.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

On Scalability for MPI Runtime Systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.

[BibT_eX]

[DOI]

Narapat Ohm Saengpatsa

Stanimire Tomov

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010

Improvement of parallelization efficiency of batch pattern BP training algorithm using Open MPI.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2010

Self-healing network for scalable fault-tolerant runtime environments.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 2010

Redesigning the message logging model for high performance.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2010

Locality and Topology Aware Intra-node Communication among Multicore CPUs.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in the Message Passing Interface, 2010

2009

Algorithm-based fault tolerance applied to high performance computing.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2009

Constructing Resiliant Communication Infrastructure for Runtime Environments.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008

Algorithmic Based Fault Tolerance Applied to High Performance Computing

[BibT_eX]

[DOI]

CoRR, 2008

The Next Frontier.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

A Scalable Tools Communications Infrastructure.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on High Performance Computing Systems and Applications (HPCS 2008), 2008

2007

Recovery Patterns for Iterative Methods in a Parallel Unstable Environment.

[BibT_eX]

[DOI]

SIAM J. Sci. Comput., 2007

Open MPI: a High Performance, Flexible Implementation of MPI Point-to-Point Communications.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2007

MPI collective algorithm selection and quadtree encoding.

[BibT_eX]

[DOI]

Parallel Comput., 2007

Performance analysis of MPI collective operations.

[BibT_eX]

[DOI]

Clust. Comput., 2007

Advanced MPI Programming.

[BibT_eX]

[DOI]

Julien Langou

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

An Evaluation of Open MPI's Matching Transport Layer on the Cray XT.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Retrospect: Deterministic Replay of MPI Applications for Interactive Distributed Debugging.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

The X-Scale Challenge.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Optimal Routing in Binomial Graph Networks.

[BibT_eX]

[DOI]

Bradley T. Vander Zanden

Proceedings of the Eighth International Conference on Parallel and Distributed Computing, 2007

Self-healing in Binomial Graph Networks.

[BibT_eX]

[DOI]

Proceedings of the On the Move to Meaningful Internet Systems 2007: OTM 2007 Workshops, 2007

Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology.

[BibT_eX]

[DOI]

Proceedings of the Parallel and Distributed Processing and Applications, 2007

Network Fault Tolerance in Open MPI.

[BibT_eX]

[DOI]

Galen M. Shipman

Richard L. Graham

Proceedings of the Euro-Par 2007, 2007

Decision Trees and MPI Collective Algorithm Selection Problem.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2007, 2007

Topic 9 Parallel and Distributed Programming.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2007, 2007

Reliability Analysis of Self-Healing Network using Discrete-Event Simulation.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

2006

Self-adapting numerical software (SANS) effort.

[BibT_eX]

[DOI]

Keith Seymour

Haihang You

Sathish S. Vadhiyar

IBM J. Res. Dev., 2006

High Performance RDMA Protocols in HPC.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Implementation and Usage of the PERUSE-Interface in Open MPI.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Scalable Fault Tolerant Protocol for Parallel Runtime Environments.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Open MPI: A High-Performance, Heterogeneous MPI.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005

Process Fault Tolerance: Semantics, Design and Applications for High Performance Computing.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2005

Hash Functions for Datatype Signatures in MPI.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Advanced Message Passing and Threading Issues.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Scalable Fault Tolerant MPI: Extending the Recovery Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Analysis of the Component Architecture Overhead in Open MPI.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Fault tolerant high performance computing by a coding approach.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

2004

OVM, une machine parallèle virtuelle à exécution dans le désordre.

[BibT_eX]

[DOI]

Tech. Sci. Informatiques, 2004

TEG: A High-Performance, Scalable, Multi-network Point-to-Point Communications Methodology.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI's TEG Point-to-Point Communications Methodology: Comparison to Existing Implementations.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004

2002

OVM: Out-of-order execution parallel virtual machine.

[BibT_eX]

[DOI]