Douglas Thain

Orcid: 0000-0001-5218-1956

According to our database1, Douglas Thain authored at least 150 papers between 2000 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Reshaping High Energy Physics Applications for Near-Interactive Execution Using TaskVine.
Proceedings of the International Conference for High Performance Computing, 2024

Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow Systems.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

2023
Landlord: Coordinating Dynamic Software Environments to Reduce Container Sprawl.
IEEE Trans. Parallel Distributed Syst., May, 2023

VisDict: Improving Communication Via a Visual Dictionary in a Science Gateway.
Comput. Sci. Eng., 2023

TaskVine: Managing In-Cluster Storage for High-Throughput Data Intensive Workflows.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Maximizing Data Utility for HPC Python Workflow Execution.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Mixed Modality Workflows in TaskVine.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

2022
Dynamic Task Shaping for High Throughput Data Analysis Applications in High Energy Physics.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

PONCHO: Dynamic Package Synthesis for Distributed and Serverless Python Applications.
Proceedings of the HiPS@HPDC 2022: Proceedings of the 2nd Workshop on High Performance Serverless Computing, 2022

Robust Meta-Workflow Management with Mufasa.
Proceedings of the 18th IEEE International Conference on e-Science, 2022

2021
Workflows Community Summit: Advancing the State-of-the-art of Scientific Workflows Management Systems Research and Development.
CoRR, 2021

Workflows Community Summit: Bringing the Scientific Workflows Community Together.
CoRR, 2021


Not All Tasks Are Created Equal: Adaptive Resource Allocation for Heterogeneous Tasks in Dynamic Workflows.
Proceedings of the 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), 2021

Emerging Frameworks for Advancing Scientific Workflows Research, Development, and Education.
Proceedings of the 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), 2021

Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

An Empirical Study of Package Dependencies and Lifetimes in Binder Python Containers.
Proceedings of the 17th IEEE International Conference on eScience, 2021

2020
Log Discovery for Troubleshooting Open Distributed Systems with TLQ.
Proceedings of the PEARC '20: Practice and Experience in Advanced Research Computing, 2020

Solving the Container Explosion Problem for Distributed High Throughput Computing.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Autoscaling High-Throughput Workloads on Container Orchestrators.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
Flexible Partitioning of Scientific Workflows Using the JX Workflow Language.
Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (learning), 2019

Dynamic Sizing of Continuously Divisible Jobs for Heterogeneous Resources.
Proceedings of the 15th International Conference on eScience, 2019

2018
A Job Sizing Strategy for High-Throughput Scientific Workflows.
IEEE Trans. Parallel Distributed Syst., 2018

Combining Static and Dynamic Storage Management for Data Intensive Scientific Workflows.
IEEE Trans. Parallel Distributed Syst., 2018

Reproducibility in Scientific Computing.
ACM Comput. Surv., 2018

VC3: A Virtual Cluster Service for Community Computation.
Proceedings of the Practice and Experience on Advanced Research Computing, 2018

SHADHO: Massively Scalable Hardware-Aware Distributed Hyperparameter Optimization.
Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision, 2018

A lightweight model for right-sizing master-worker applications.
Proceedings of the International Conference for High Performance Computing, 2018

Automatic Dependency Management for Scientific Applications on Clusters.
Proceedings of the 2018 IEEE International Conference on Cloud Engineering, 2018

MAKER as a Service: Moving HPC Applications to Jetstream Cloud.
Proceedings of the 2018 IEEE International Conference on Cloud Engineering, 2018

Efficient Integration of Containers into Scientific Workflows.
Proceedings of the 9th Workshop on Scientific Cloud Computing, 2018

Early Experience Using Amazon Batch for Scientific Workflows.
Proceedings of the 9th Workshop on Scientific Cloud Computing, 2018

A First Look at the JX Workflow Language.
Proceedings of the 14th IEEE International Conference on e-Science, 2018

An Algebra for Robust Workflow Transformations.
Proceedings of the 14th IEEE International Conference on e-Science, 2018

Wharf: Sharing Docker Images in a Distributed File System.
Proceedings of the ACM Symposium on Cloud Computing, 2018

2017
Designing Self-Tuning Split-Map-Merge Applications for High Cost-Efficiency in the Cloud.
IEEE Trans. Cloud Comput., 2017

Report on the first workshop on negative and null results in eScience.
Concurr. Comput. Pract. Exp., 2017

Balancing push and pull in Confuga, an active storage cluster file system for scientific workflows.
Concurr. Comput. Pract. Exp., 2017

Taming metadata storms in parallel filesystems with metaFS.
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2017

Towards Scalable and Dynamic Social Sensing Using A Distributed Computing Framework.
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems, 2017

Facilitating the Reproducibility of Scientific Workflows with Execution Environment Specifications.
Proceedings of the International Conference on Computational Science, 2017

Deploying High Throughput Scientific Workflows on Container Schedulers with Makeflow and Mesos.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
DiaPro: Unifying Dynamic Impact Analyses for Improved and Variable Cost-Effectiveness.
ACM Trans. Softw. Eng. Methodol., 2016

DISTEA: Efficient Dynamic Impact Analysis for Distributed Systems.
CoRR, 2016

DistIA: a cost-effective dynamic impact analysis for distributed programs.
Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016

Conducting reproducible research with Umbrella: Tracking, creating, and preserving execution environments.
Proceedings of the 12th IEEE International Conference on e-Science, 2016

PRUNE: A preserving run environment for reproducible scientific computing.
Proceedings of the 12th IEEE International Conference on e-Science, 2016

2015
Accelerating Comparative Genomics Work ows in a Distributed Environment with Optimized Data Partitioning and Workflow Fusion.
Scalable Comput. Pract. Exp., 2015

An invariant framework for conducting reproducible computational science.
J. Comput. Sci., 2015

Adapting Collaborative Software Development Techniques to Structural Engineering.
Comput. Sci. Eng., 2015

The Evolution of Global Scale Filesystems for Scientific Software Distribution.
Comput. Sci. Eng., 2015

DAGViz: a DAG visualization tool for analyzing task-parallel program traces.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Techniques for Preserving Scientific Software Executions: Preserve the Mess or Encourage Cleanliness?
Proceedings of the 12th International Conference on Digital Preservation, 2015

Integrating Containers into Workflows: A Case Study Using Makeflow, Work Queue, and Docker.
Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing, 2015

Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids.
Proceedings of the 8th International Workshop on Virtualization Technologies in Distributed Computing, 2015

Scaling Up Bioinformatics Workflows with Dynamic Job Expansion: A Case Study Using Galaxy and Makeflow.
Proceedings of the 11th IEEE International Conference on e-Science, 2015

Scaling Data Intensive Physics Applications to 10k Cores on Non-dedicated Clusters with Lobster.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Practical Resource Monitoring for Robust High Throughput Computing.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Confuga: Scalable Data Intensive Computing for POSIX Workflows.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
AWE-WQ: Fast-Forwarding Molecular Dynamics Using the Accelerated Weighted Ensemble.
J. Chem. Inf. Model., 2014

Scaling up genome annotation using MAKER and work queue.
Int. J. Bioinform. Res. Appl., 2014

Lessons Learned from an Experiment in Crowdsourcing Complex Citizen Engineering Tasks with Amazon Mechanical Turk.
CoRR, 2014

Adapting bioinformatics applications for heterogeneous systems: a case study.
Concurr. Comput. Pract. Exp., 2014

Opportunistic High Energy Physics Computing in User Space with Parrot.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Expanding Tasks of Logical Workflows Into Independent Workflows for Improved Scalability.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Accelerating Comparative Genomics Workflows in a Distributed Environment with Optimized Data Partitioning.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

DeltaDB: A Scalable Database Design for Time-Varying Schema-Free Data.
Proceedings of the 2014 IEEE International Congress on Big Data, Anchorage, AK, USA, June 27, 2014

2013
Toward fine-grained online task characteristics estimation in scientific workflows.
Proceedings of WORKS 2013: 8th Workshop On Workflows in Support of Large-Scale Science, 2013

Automated packaging of bioinformatics workflows for portability and durability using makeflow.
Proceedings of WORKS 2013: 8th Workshop On Workflows in Support of Large-Scale Science, 2013

Design of an active storage cluster file system for DAG workflows.
Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems, 2013

Making work queue cluster-friendly for data intensive scientific applications.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Case Studies in Designing Elastic Applications.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Environmentally Opportunistic Computing.
Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

A Framework for Scalable Genome Assembly on Clusters, Clouds, and Grids.
IEEE Trans. Parallel Distributed Syst., 2012

ROARS: a robust object archival system for data intensive scientific computing.
Distributed Parallel Databases, 2012

Scripting distributed scientific workflows using Weaver.
Concurr. Comput. Pract. Exp., 2012

Makeflow: a portable abstraction for data intensive computing on clusters, clouds, and grids.
Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies, 2012

Shifting the bioinformatics computing paradigm: A case study in parallelizing genome annotation using MAKER and Work Queue.
Proceedings of the IEEE 2nd International Conference on Computational Advances in Bio and Medical Sciences, 2012

A system for management of Computational Fluid Dynamics simulations for civil engineering.
Proceedings of the 8th IEEE International Conference on E-Science, 2012

Folding proteins at 500 ns/hour with Work Queue.
Proceedings of the 8th IEEE International Conference on E-Science, 2012

Resource Management for Elastic Cloud Workflows.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Fine-Grained Access Control in the Chirp Distributed File System.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

2011
Biocompute 2.0: an improved collaborative workspace for data intensive bio-science.
Concurr. Comput. Pract. Exp., 2011

Expert-Citizen Engineering: "Crowdsourcing" Skilled Citizens.
Proceedings of the IEEE Ninth International Conference on Dependable, 2011

Converting a High Performance Application to an Elastic Cloud Application.
Proceedings of the IEEE 3rd International Conference on Cloud Computing Technology and Science, 2011

2010
All-Pairs: An Abstraction for Data-Intensive Computing on Campus Grids.
IEEE Trans. Parallel Distributed Syst., 2010

Visualizing massively multithreaded applications with ThreadScope.
Concurr. Comput. Pract. Exp., 2010

Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions.
Clust. Comput., 2010

Middleware support for many-task computing.
Clust. Comput., 2010

Biocompute: towards a collaborative workspace for data intensive bio-science.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Weaver: integrating distributed computing abstractions into scientific workflows using Python.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Towards long term data quality in a large scale biometrics experiment.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

ROARS: a scalable repository for data intensive scientific computing.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

Environmentally Opportunistic Computing transforming the data center for economic and environmental sustainability.
Proceedings of the International Green Computing Conference 2010, 2010

Grid, Cluster and Cloud Computing.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

A Comparison and Critique of Eucalyptus, OpenNebula and Nimbus.
Proceedings of the Cloud Computing, Second International Conference, 2010

Attaching Cloud Storage to a Campus Grid Using Parrot, Chirp, and Hadoop.
Proceedings of the Cloud Computing, Second International Conference, 2010

2009
Reflections on the virtues of modularity: a case study in linux security modules.
Softw. Pract. Exp., 2009

Chirp: a practical global filesystem for cluster and Grid computing.
J. Grid Comput., 2009

Experience with BXGrid: a data repository and computing grid for biometrics research.
Clust. Comput., 2009

Highly scalable genome assembly on campus grids.
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, 2009

Harnessing parallelism in multicore clusters with the all-pairs and wavefront abstractions.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

The quest for scalable support of data-intensive workloads in distributed systems.
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009

Scheduling Grid workloads on multicore clusters to minimize energy and maximize performance.
Proceedings of the 2009 10th IEEE/ACM International Conference on Grid Computing, 2009

Cooperative Localization in GPS-Limited Urban Environments.
Proceedings of the Ad Hoc Networks, First International Conference, 2009

Coordination of Access to Large-Scale Datasets in Distributed Environments.
Proceedings of the Scientific Data Management - Challenges, Technology, and Deployment., 2009

2008
Biomolecular committor probability calculation enabled by processing in network storage.
Parallel Comput., 2008

Making the best of a bad situation: Prioritized storage management in GEMS.
Future Gener. Comput. Syst., 2008

ENAVis: Enterprise Network Activities Visualization.
Proceedings of the 22nd Large Installation System Administration Conference, 2008

Qthreads: An API for programming with millions of lightweight threads.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

All-pairs: An abstraction for data-intensive cloud computing.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Data mining on the grid for the grid.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Scaling up Classifiers to Cloud Computers.
Proceedings of the 8th IEEE International Conference on Data Mining (ICDM 2008), 2008

DataLab: transactional data-parallel computing on an active storage cloud.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008

Troubleshooting thousands of jobs on production grids using data mining techniques.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

Using Small Abstractions to Program Large Distributed Systems.
Proceedings of the Fourth International Conference on e-Science, 2008

BXGrid: A Data Repository and Workflow Abstraction for Biometrics Research.
Proceedings of the Fourth International Conference on e-Science, 2008

2007
Challenges in Executing Data Intensive Biometric Workloads on a Desktop Grid.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Biomolecular Path Sampling Enabled by Processing in Network Storage.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Efficient access to many samall files in a filesystem for grid computing.
Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (GRID 2007), 2007

2006
How to measure a large open-source distributed system.
Concurr. Comput. Pract. Exp., 2006

Transparent access to Grid resources for user software.
Concurr. Comput. Pract. Exp., 2006

Access control for a replica management database.
Proceedings of the 2006 ACM Workshop On Storage Security And Survivability, 2006

iDIBS: An Improved Distributed Backup System.
Proceedings of the 12th International Conference on Parallel and Distributed Systems, 2006

Troubleshooting Distributed Systems via Data Mining.
Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006

Operating System Support for Space Allocation in Grid Storage Systems.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

Cacheable Decentralized Groups for Grid Resource Access Control.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

Grid Deployment of Legacy Bioinformatics Applications with Transparent Data Access.
Proceedings of the 7th IEEE/ACM International Conference on Grid Computing (GRID 2006), 2006

Positioning Dynamic Storage Caches for Transient Data.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

Using Condor Glide-Ins and Parrot to Move from Dedicated Ressources to the Grid.
Proceedings of the ARCS 2006, 2006

2005
Parrot: Transparent User-Level Middleware for Data-Intensive Computing.
Scalable Comput. Pract. Exp., 2005

Distributed computing in practice: the Condor experience.
Concurr. Pract. Exp., 2005

The Consequences of Decentralized Security in a Cooperative Storage System.
Proceedings of the 3rd International IEEE Security in Storage Workshop (SISW 2005), 2005

Separating Abstractions from Resources in a Tactical Storage System.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Identity Boxing: A New Technique for Consistent Global Identity.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Generosity and gluttony in GEMS: grid enabled molecular simulations.
Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005

Identity boxing: secure user-level containment for the grid.
Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005

2004
Explicit Control in the Batch-Aware Distributed File System.
Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI 2004), 2004

Building Reliable Clients and Services.
Proceedings of the Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition, 2004

2003
The Ethernet Approach to Grid Computing.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

Pipeline and Batch Sharing in Grid Workloads.
Proceedings of the 12th International Symposium on High-Performance Distributed Computing (HPDC-12 2003), 2003

XtremWeb & Condor sharing resources between Internet connected Condor pools.
Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003

2002
Caveat Emptor: Making Grid Services Dependable from the Client Side.
Proceedings of the 9th Pacific Rim International Symposium on Dependable Computing (PRDC 2002), 2002

Error Scope on a Computational Grid: Theory and Practice.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

2001
Multiple Bypass: Interposition Agents for Distributed Computing.
Clust. Comput., 2001

Gathering at the well: creating communities for grid I/O.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

The Kangaroo Approach to Data Movement on the Grid.
Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10 2001), 2001

2000
Bypass: A Tool for Building Split Execution Systems.
Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, 2000


  Loading...