2025
Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey.
CoRR, January, 2025
HDF5 in the exascale era: Delivering efficient and scalable parallel I/O for exascale applications.
Int. J. High Perform. Comput. Appl., 2025
2024
h5bench: A unified benchmark suite for evaluating HDF5 I/O performance on pre-exascale platforms.
Concurr. Comput. Pract. Exp., July, 2024
PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
IEEE Trans. Parallel Distributed Syst., May, 2024
I/O Access Patterns in HPC Applications: A 360-Degree Survey.
ACM Comput. Surv., February, 2024
I/O in Machine Learning Applications on HPC Systems: A 360-degree Survey.
CoRR, 2024
Data Readiness for AI: A 360-Degree Survey.
CoRR, 2024
AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI.
Proceedings of the 36th International Conference on Scientific and Statistical Database Management, 2024
Enabling Data Reduction for Flash-X Simulations.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
A2FL: Autonomous and Adaptive File Layout in HPC through Real-time Access Pattern Analysis.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
TunIO: An AI-powered Framework for Optimizing HPC I/O.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Message from the 2024 Workshops Chair and Vice-chair.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Drilling Down I/O Bottlenecks with Cross-layer I/O Profile Exploration.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
The Art of Sparsity: Mastering High-Dimensional Tensor Storage.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
ION: Navigating the HPC I/O Optimization Journey using Large Language Models.
Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems, 2024
A High-Performance Collective I/O Framework Leveraging Node-Local Persistent Memory.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024
Object-Centric Data Management in HPC Workflows - A Case Study.
Proceedings of the IEEE International Conference on Cluster Computing, 2024
IDIOMS: Index-powered Distributed Object-centric Metadata Search for Scientific Data Management.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024
Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Querying Systems.
Proceedings of the IEEE International Conference on Big Data, 2024
TensorSearch: Parallel Similarity Search on Tensors.
Proceedings of the IEEE International Conference on Big Data, 2024
2023
Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis.
J. Big Data, December, 2023
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Illuminating the I/O Optimization Path of Scientific Applications.
Proceedings of the High Performance Computing - 38th International Conference, 2023
Evaluating Asynchronous Parallel I/O on HPC Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Efficient Asynchronous I/O with Request Merging.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
AIIO: Using Artificial Intelligence for Job-Level and Automatic I/O Performance Bottleneck Diagnosis.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023
HiPC 2023 Student Research Symposium (HiPC SRS 2023).
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023
Runway: In-transit Data Compression on Heterogeneous HPC Systems.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
PSQS: Parallel Semantic Querying Service for Self-describing File Formats.
Proceedings of the IEEE International Conference on Big Data, 2023
2022
Transparent Asynchronous Parallel I/O Using Background Threads.
IEEE Trans. Parallel Distributed Syst., 2022
Real-time and post-hoc compression for data from Distributed Acoustic Sensing.
Comput. Geosci., 2022
A Comparison of HDF5, Zarr, and netCDF4 in Performing Common I/O Operations.
CoRR, 2022
The LBNL Superfacility Project Report.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Design and implementation of dynamic I/O control scheme for large scale distributed file systems.
Clust. Comput., 2022
Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5.
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Accelerating Flash-X Simulations with Asynchronous I/O.
Proceedings of the IEEE/ACM International Parallel Data Systems Workshop, 2022
Drishti: Guiding End-Users in the I/O Optimization Journey.
Proceedings of the IEEE/ACM International Parallel Data Systems Workshop, 2022
Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022
Kv2vec: A Distributed Representation Method for Key-value Pairs from Metadata Attributes.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2022
PROV-IO: An I/O-Centric Provenance Framework for Scientific Data on HPC Systems.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022
Understanding Parallel I/O Performance and Tuning.
Proceedings of the SNTA@HPDC 2022, 2022
Access Patterns and Performance Behaviors of Multi-layer Supercomputer I/O Subsystems under Production Load.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022
HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022
2021
User-Defined Tensor Data Analysis, 2
Springer Briefs in Computer Science, Springer, ISBN: 978-3-030-70749-1, 2021
Exploiting user activeness for data retention in HPC systems.
Proceedings of the International Conference for High Performance Computing, 2021
Data-Aware Storage Tiering for Deep Learning.
Proceedings of the 6th IEEE/ACM International Parallel Data Systems Workshop, 2021
SCTuner: An Autotuner Addressing Dynamic I/O Needs on Supercomputer I/O Subsystems.
Proceedings of the 6th IEEE/ACM International Parallel Data Systems Workshop, 2021
I/O Bottleneck Detection and Tuning: Connecting the Dots using Interactive Log Analysis.
Proceedings of the 6th IEEE/ACM International Parallel Data Systems Workshop, 2021
An In-Depth I/O Pattern Analysis in HPC Systems.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights.
Proceedings of the IEEE International Conference on Cluster Computing, 2021
Battle of the Defaults: Extracting Performance Characteristics of HDF5 under Production Load.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
Tuning Parallel Data Compression and I/O for Large-scale Earthquake Simulation.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021
Optimizing Performance of Parallel I/O Accesses to Non-contiguous Blocks in Multiple Array Variables.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021
2020
ExaHDF5: Delivering Efficient Parallel I/O on Exascale Computing Systems.
J. Comput. Sci. Technol., 2020
Interfacing HDF5 with a scalable object-centric storage system on hierarchical storage.
Concurr. Comput. Pract. Exp., 2020
GPU Direct I/O with HDF5.
Proceedings of the Fifth IEEE/ACM International Parallel Data Systems Workshop, 2020
Cross-facility science with the Superfacility Project at LBNL.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2nd IEEE/ACM Annual Workshop on Extreme-scale Experiment-in-the-Loop Computing, 2020
Parallel Query Service for Object-centric Data Management Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020
Predicting and Comparing the Performance of Array Management Libraries.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
DASSA: Parallel DAS Data Storage and Analysis for Subsurface Event Detection.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Towards HPC I/O Performance Prediction through Large-scale Log Analysis.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020
HPC Workload Characterization Using Feature Selection and Clustering.
Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, 2020
Uncovering Access, Reuse, and Sharing Characteristics of I/O-Intensive Files on Large-Scale Production HPC Systems.
Proceedings of the 18th USENIX Conference on File and Storage Technologies, 2020
2019
Optimizing I/O Performance of HPC Applications with Autotuning.
ACM Trans. Parallel Comput., 2019
Parallel membership queries on very large scientific data sets using bitmap indexes.
Concurr. Comput. Pract. Exp., 2019
SLOPE: Structural Locality-Aware Programming Model for Composing Array Data Analysis.
Proceedings of the High Performance Computing - 34th International Conference, 2019
Terabyte-scale Particle Data Analysis: An ArrayUDF Case Study.
Proceedings of the 31st International Conference on Scientific and Statistical Database Management, 2019
Enabling Transparent Asynchronous I/O using Background Threads.
Proceedings of the IEEE/ACM Fourth International Parallel Data Systems Workshop, 2019
Revisiting I/O behavior in large-scale storage systems: the expected and the unexpected.
Proceedings of the International Conference for High Performance Computing, 2019
Sparse Data Management in HDF5.
Proceedings of the 1st IEEE/ACM Annual Workshop on Large-scale Experiment-in-the-Loop Computing, 2019
Understanding Data Motion in the Modern HPC Data Center.
Proceedings of the IEEE/ACM Fourth International Parallel Data Systems Workshop, 2019
Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance.
Proceedings of the IEEE/ACM Fourth International Parallel Data Systems Workshop, 2019
MIQS: metadata indexing and querying service for self-describing file formats.
Proceedings of the International Conference for High Performance Computing, 2019
Exploring Metadata Search Essentials for Scientific Data Management.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
Analysis in the Data Path of an Object-Centric Data Management System.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
Tuning Object-Centric Data Management Systems for Large Scale Scientific Applications.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
A Zoom-in Analysis of I/O Logs to Detect Root Causes of I/O Performance Bottlenecks.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
DCA-IO: A Dynamic I/O Control Scheme for Parallel and Distributed File Systems.
Proceedings of the 19th IEEE/ACM International Symposium on Cluster, 2019
2018
A year in the life of a parallel file system.
Proceedings of the International Conference for High Performance Computing, 2018
Evaluation of HPC Application I/O on Object Storage Systems.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 3rd IEEE/ACM International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2018
ArrayBridge: Interweaving Declarative Array Processing in SciDB with Imperative HDF5-Based Programs.
Proceedings of the 34th IEEE International Conference on Data Engineering, 2018
Toward Transparent Data Management in Multi-Layer Storage Hierarchy of HPC Systems.
Proceedings of the 2018 IEEE International Conference on Cloud Engineering, 2018
IOMiner: Large-Scale Analytics Framework for Gaining Knowledge from I/O Logs.
Proceedings of the IEEE International Conference on Cluster Computing, 2018
UniviStor: Integrated Hierarchical and Distributed Storage for HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2018
A Transparent Server-Managed Object Storage System for HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2018
Toward Scalable and Asynchronous Object-Centric Data Management for HPC.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018
ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018
DART: distributed adaptive radix tree for efficient affix-based keyword search on HPC systems.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018
2017
ArrayBridge: Interweaving declarative array processing with high-performance computing.
CoRR, 2017
UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis.
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage & Data Intensive Scalable Computing Systems, 2017
ArrayUDF: User-Defined Scientific Data Analysis on Arrays.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017
SoMeta: Scalable Object-Centric Metadata Management for High Performance Computing.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
2016
AMR-aware in situ indexing and scalable querying.
Proceedings of the 24th High Performance Computing Symposium, 2016
PANDA: Extreme Scale Parallel K-Nearest Neighbor on Distributed Architectures.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
In Situ Storage Layout Optimization for AMR Spatio-temporal Read Accesses.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 45th International Conference on Parallel Processing, 2016
SDS-Sort: Scalable Dynamic Skew-aware Parallel Sorting.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016
Data Elevator: Low-Contention Data Movement in Hierarchical Storage System.
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016
AMRZone: A Runtime AMR Data Sharing Framework for Scientific Applications.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016
Usage Pattern-Driven Dynamic Data Layout Reorganization.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016
Exploring memory hierarchy and network topology for runtime AMR data sharing across scientific applications.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016
2015
Towards Exascale Scientific Metadata Management.
CoRR, 2015
Techniques for modeling large-scale HPC I/O workloads.
Proceedings of the 6th International Workshop on Performance Modeling, 2015
BD-CATS: big data clustering at trillion particle scale.
Proceedings of the International Conference for High Performance Computing, 2015
Heavy-tailed distribution of parallel I/O system response time.
Proceedings of the 10th Parallel Data Storage Workshop, 2015
Pattern-driven parallel I/O tuning.
Proceedings of the 10th Parallel Data Storage Workshop, 2015
Collective Computing for Scientific Big Data Analysis.
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015
A Multiplatform Study of I/O Behavior on Petascale Supercomputers.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015
Dynamic Model-Driven Parallel I/O Performance Tuning.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Parallel In Situ Detection of Connected Components in Adaptive Mesh Refinement Data.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015
TECA: Petascale Pattern Recognition for Climate Science.
Proceedings of the Computer Analysis of Images and Patterns, 2015
Security for the scientific data services framework.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
Spatially clustered join on heterogeneous scientific data sets.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015
2014
Parallel data analysis directly on scientific file formats.
Proceedings of the International Conference on Management of Data, 2014
Model-Driven Data Layout Selection for Improving Read Performance.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014
Simplifying index file structure to improve I/O performance of parallel indexing.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014
Improving parallel I/O autotuning with performance modeling.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014
Parallel query evaluation as a Scientific Data Service.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014
2013
Why high performance visual data analytics is both relevant and difficult.
Proceedings of the Visualization and Data Analysis 2013, 2013
Optimizing fastquery performance on lustre file system.
Proceedings of the Conference on Scientific and Statistical Database Management, 2013
SDS: a framework for scientific data services.
Proceedings of the 8th Parallel Data Storage Workshop, 2013
Taming parallel I/O complexity with auto-tuning.
Proceedings of the International Conference for High Performance Computing, 2013
A framework for auto-tuning HDF5 applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013
Expediting scientific data analysis with reorganization of data.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013
Segmented analysis for reducing data movement.
Proceedings of the 2013 IEEE International Conference on Big Data (IEEE BigData 2013), 2013
2012
TECA: A Parallel Toolkit for Extreme Climate Analysis.
Proceedings of the International Conference on Computational Science, 2012
Parallel I/O, analysis, and visualization of a trillion particle simulation.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Abstract: Auto-Tuning of Parallel IO Parameters for HDF5 Applications.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Boosting Application-Specific Parallel I/O Optimization Using IOSIG.
Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012
2011
Special issue on Data Intensive Computing.
J. Parallel Distributed Comput., 2011
Energy-Aware Workload Consolidation on GPU.
Proceedings of the 2011 International Conference on Parallel Processing Workshops, 2011
2010
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010
Exploiting the forgiving nature of applications for scalable parallel execution.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Best-effort semantic document search on GPUs.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010
2009
Special Issue of the Journal of Parallel and Distributed Computing: Data-Intensive Computing.
J. Parallel Distributed Comput., 2009
Taxonomy of Data Prefetching for Multicore Processors.
J. Comput. Sci. Technol., 2009
Core-aware memory access scheduling schemes.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Modeling Data Access Contention in Multicore Architectures.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009
2008
Hiding I/O latency with pre-execution prefetching for parallel applications.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Parallel I/O prefetching using MPI file caching and I/O signatures.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
A Taxonomy of Data Prefetching Mechanisms.
Proceedings of the 9th International Symposium on Parallel Architectures, 2008
2008 International Conference on Parallel Processing September 8-12, 2008 Portland, Oregon Exploring Parallel I/O Concurrency with Speculative Prefetching.
Proceedings of the 2008 International Conference on Parallel Processing, 2008
2007
Server-Based Data Push Architecture for Multi-Processor Environments.
J. Comput. Sci. Technol., 2007
Data access history cache and associated data prefetching mechanisms.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007
Improving Data Access Performance with Server Push Architecture.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
2006
Automatic Memory Optimizations for Improving MPI Derived Datatype Performance.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006
Memory Servers: A Scope of SOA for High-End Computing.
Proceedings of the 2006 IEEE International Conference on Services Computing (SCC 2006), 2006
2005
Isolating Costs in Shared Memory Communication Buffering.
Parallel Process. Lett., 2005
2004
Predicting memory-access cost based on data-access patterns.
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004
2003
Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost.
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003