Thomas Hérault

Orcid: 0000-0001-6756-6189

Affiliations:
  • Innovative Computing Laboratory, the University of Tennessee, Knoxville, TN, USA
  • University of Paris-Sud, Laboratory for Computer Science (LRI), France (former)


According to our database1, Thomas Hérault authored at least 106 papers between 2001 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Revisiting I/O bandwidth-sharing strategies for HPC applications.
J. Parallel Distributed Comput., 2024

Multi-GPU work sharing in a task-based dataflow programming model.
Future Gener. Comput. Syst., 2024

A survey on checkpointing strategies: Should we always checkpoint à la Young/Daly?
Future Gener. Comput. Syst., 2024

Evaluating PaRSEC Through Matrix Computations in Scientific Applications.
Proceedings of the Asynchronous Many-Task Systems and Applications, 2024

2023
When to checkpoint at the end of a fixed-length reservation?
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

2022
Comparing Distributed Termination Detection Algorithms for Modern HPC Platforms.
Int. J. Netw. Comput., 2022

Composition of Algorithmic Building Blocks in Template Task Graphs.
Proceedings of the IEEE/ACM Parallel Applications Workshop: Alternatives To MPI+X, 2022

Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Checkpointing à la Young/Daly: An Overview.
Proceedings of the 2022 Fourteenth International Conference on Contemporary Computing, 2022

Pushing the Boundaries of Small Tasks: Scalable Low-Overhead Data-Flow Programming in TTG.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Revisiting Credit Distribution Algorithms for Distributed Termination Detection.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

2020
Overhead of using spare nodes.
Int. J. High Perform. Comput. Appl., 2020

The Template Task Graph (TTG) - an emerging practical dataflow programming paradigm for scientific simulation at extreme scale.
Proceedings of the 5th IEEE/ACM International Workshop on Extreme Scale Programming Models and Middleware, 2020

A Comparison of Several Fault-Tolerance Methods for the Detection and Correction of Floating-Point Errors in Matrix-Matrix Multiplication.
Proceedings of the Euro-Par 2020: Parallel Processing Workshops, 2020

2019
Comparing the performance of rigid, moldable and grid-shaped applications on failure-prone HPC platforms.
Parallel Comput., 2019

Checkpointing Strategies for Shared High-Performance Computing Platforms.
Int. J. Netw. Comput., 2019

Generic Matrix Multiplication for Multi-GPU Accelerated Distributed-Memory Platforms over PaRSEC.
Proceedings of the 10th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2019

Performance Analysis of Tile Low-Rank Cholesky Factorization Using PaRSEC Instrumentation Tools.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Replication is more efficient than you think.
Proceedings of the International Conference for High Performance Computing, 2019

Software-Defined Events through PAPI.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

2018
Argobots: A Lightweight Low-Level Threading and Tasking Framework.
IEEE Trans. Parallel Distributed Syst., 2018

A failure detector for HPC platforms.
Int. J. High Perform. Comput. Appl., 2018

Optimal Cooperative Checkpointing for Shared High-Performance Computing Platforms.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Do Moldable Applications Perform Better on Failure-Prone HPC Platforms?
Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

2017
Dynamic task discovery in PaRSEC: a data-flow task-based runtime.
Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2017

2016
Assessing the cost of redistribution followed by a computational kernel: Complexity and performance results.
Parallel Comput., 2016

Failure detection and propagation in HPC systems.
Proceedings of the International Conference for High Performance Computing, 2016

2015
Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy.
ACM Trans. Parallel Comput., 2015

Composing resilience techniques: ABFT, periodic and incremental checkpointing.
Int. J. Netw. Comput., 2015

Practical scalable consensus for pseudo-synchronous distributed systems.
Proceedings of the International Conference for High Performance Computing, 2015

Sliding Substitution of Failed Nodes.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

From MPI to OpenSHMEM: Porting LAMMPS.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Technologies, 2015

Design for a Soft Error Resilient Dynamic Task-Based Runtime.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

2014
Performance and reliability trade-offs for the double checkpointing algorithm.
Int. J. Netw. Comput., 2014

Unified model for assessing checkpointing protocols at extreme-scale.
Concurr. Comput. Pract. Exp., 2014

PTG: an abstraction for unhindered parallelism.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

A Multithreaded Communication Substrate for OpenSHMEM.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

Determining the Optimal Redistribution for a Given Data Partition.
Proceedings of the IEEE 13th International Symposium on Parallel and Distributed Computing, 2014

Assessing the Impact of ABFT and Checkpoint Composite Strategies.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Utilizing dataflow-based execution for coupled cluster methods.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2013
Hierarchical QR factorization algorithms for multi-core clusters.
Parallel Comput., 2013

Post-failure recovery of MPI communication capability: Design and rationale.
Int. J. High Perform. Comput. Appl., 2013

PaRSEC: Exploiting Heterogeneity to Enhance Scalability.
Comput. Sci. Eng., 2013

Correlated set coordination in fault tolerant message logging protocols for many-core clusters.
Concurr. Comput. Pract. Exp., 2013

Extending the scope of the Checkpoint-on-Failure protocol for forward recovery in standard MPI.
Concurr. Comput. Pract. Exp., 2013

An evaluation of User-Level Failure Mitigation support in MPI.
Computing, 2013

Optimal Checkpointing Period: Time vs. Energy.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

On the Combination of Silent Error Detection and Checkpointing.
Proceedings of the IEEE 19th Pacific Rim International Symposium on Dependable Computing, 2013

Revisiting the Double Checkpointing Algorithm.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Multi-criteria Checkpointing Strategies: Response-Time versus Resource Utilization.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012
DAGuE: A generic distributed DAG engine for High Performance Computing.
Parallel Comput., 2012

Algorithm-based fault tolerance for dense matrix factorizations.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Hierarchical QR Factorization Algorithms for Multi-core Cluster Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Scalable Dense Linear Algebra on Heterogeneous Hardware.
Proceedings of the Transition of HPC Towards Exascale Computing, 2012

From Serial Loops to Parallel Execution on Distributed Systems.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
QCG-OMPI: MPI applications on grids.
Future Gener. Comput. Syst., 2011

Scalable Runtime for MPI: Efficiently Building the Communication Infrastructure.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

Flexible Development of Dense Linear Algebra Algorithms on Massively Parallel Architectures with DPLASMA.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Correlated Set Coordination in Fault Tolerant Message Logging Protocols.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Process Distance-Aware Adaptive MPI Collective Communications.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

On Scalability for MPI Runtime Systems.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Performance Portability of a GPU Enabled Factorization with the DAGuE Framework.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

2010
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Supple: a flexible probabilistic data dissemination protocol for wireless sensor networks.
Proceedings of the 13th International Symposium on Modeling Analysis and Simulation of Wireless and Mobile Systems, 2010

DSL-Lab: A Low-Power Lightweight Platform to Experiment on Domestic Broadband Internet.
Proceedings of the Ninth International Symposium on Parallel and Distributed Computing, 2010

QR factorization of tall and skinny matrices in a grid computing environment.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

SAFE-OS: A secure and usable desktop operating system.
Proceedings of the CRiSIS 2010, 2010

Scalability and Parallelization of Monte-Carlo Tree Search.
Proceedings of the Computers and Games - 7th International Conference, 2010

Planning Large Data Transfers in Institutional Grids.
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009
Foreword.
Parallel Comput., 2009

Hierarchical Replication Techniques to Ensure Checkpoint Storage Reliability in Grid Environment.
J. Interconnect. Networks, 2009

Constructing Resiliant Communication Infrastructure for Runtime Environments.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

MPI Applications on Grids: A Topology Aware Approach.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Running Parallel Applications with Topology-Aware Grid Middleware.
Proceedings of the Fifth International Conference on e-Science, 2009

High accuracy failure injection in parallel and distributed systems using virtualization.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols.
Future Gener. Comput. Syst., 2008

Cell Assisted APMC.
Proceedings of the Fifth International Conference on the Quantitative Evaluaiton of Systems (QEST 2008), 2008

On the Complexity of a Self-Stabilizing Spanning Tree Algorithm for Large Scale Systems.
Proceedings of the 14th IEEE Pacific Rim International Symposium on Dependable Computing, 2008

Emulation platform for high accuracy failure injection in grids.
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

Grid Services for MPI.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
Virtual Parallel Machines Through Virtualization: Impact on MPI Executions.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

Grid Services for MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

A Model for Large Scale Self-Stabilization.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A Distributed and Replicated Service for Checkpoint Storage.
Proceedings of the Making Grids Work: Proceedings of the CoreGRID Workshop on Programming Models Grid and P2P System Architecture Grid Systems, 2007

2006
MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI.
Int. J. High Perform. Comput. Appl., 2006

Hybrid Preemptive Scheduling of Message Passing Interface Applications on Grids.
Int. J. High Perform. Comput. Appl., 2006

Evaluating Complex MAC Protocols for Sensor Networks with APMC.
Proceedings of the 6th International Workshop on Automated Verification of Critical Systems, 2006

Brief Announcement: Self-stabilizing Spanning Tree Algorithm for Large Scale Systems.
Proceedings of the Stabilization, 2006

MPI tools and performance studies - Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Probabilistic verification of sensor networks.
Proceedings of the 4th International Confernce on Computer Sciences: Research, 2006

APMC 3.0: Approximate Verification of Discrete and Continuous Time Markov Chains.
Proceedings of the Third International Conference on the Quantitative Evaluation of Systems (QEST 2006), 2006

FAIL-MPI: How Fault-Tolerant Is Fault-Tolerant MPI?
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
Computing on large-scale distributed systems: XtremWeb architecture, programming models, security, tests and convergence with grid.
Future Gener. Comput. Syst., 2005

Distribution, Approximation and Probabilistic Model Checking.
Proceedings of the 4th International Workshop on Parallel and Distributed Methods in Verification, 2005

Impact of Event Logger on Causal Message Logging Protocols for Fault Tolerant MPI.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004
Probabilistic Model Checking of the CSMA/CD Protocol Using PRISM and APMC.
Proceedings of the Fouth International Workshop on Automated Verification of Critical Systems, 2004

Approximate Probabilistic Model Checking.
Proceedings of the Verification, 2004

RPC-V: Toward Fault-Tolerant RPC for Internet Connected Desktop Grids with Volatile Nodes.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Hybrid Preemptive Scheduling of MPI Applications on the Grids.
Proceedings of the 5th International Workshop on Grid Computing (GRID 2004), 2004

Improved message logging versus improved coordinated checkpointing for fault tolerant MPI.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004

2003
MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging.
Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

2002
Fault-Local Stabilization: The Shortest Path Tree.
Proceedings of the 21st Symposium on Reliable Distributed Systems (SRDS 2002), 2002

MPICH-V: toward a scalable fault tolerant MPI for volatile nodes.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

2001
Easy Stabilization with an Agent.
Proceedings of the Self-Stabilizing Systems, 5th International Workshop, 2001


  Loading...