Dong Dai

Orcid: 0000-0003-4078-8149

Affiliations:
  • University of North Carolina at Charlotte, Department of Computer Science, Charlotte, NC, USA
  • Texas Tech University, Department of Computer Science, Lubbock, TX, USA
  • Argonne National Laboratory, Lemont, IL, USA
  • University of Science and Technology of China, Department of Computer Science and Technology, Hefei, China (PhD)


According to our database1, Dong Dai authored at least 69 papers between 2012 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
PROV-IO$^+$+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
IEEE Trans. Parallel Distributed Syst., May, 2024

Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

ION: Navigating the HPC I/O Optimization Journey using Large Language Models.
Proceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems, 2024

2023
Dynamic Resource Provisioning for Iterative Workloads on Apache Spark.
IEEE Trans. Cloud Comput., 2023

PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems.
CoRR, 2023

ClusterLog: Clustering Logs for Effective Log-based Anomaly Detection.
CoRR, 2023

IOPathTune: Adaptive Online Parameter Tuning for Parallel File System I/O Path.
CoRR, 2023

A Reinforcement Learning Based Backfilling Strategy for HPC Batch Jobs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

DGAP: Efficient Dynamic Graph Analysis on Persistent Memory.
Proceedings of the International Conference for High Performance Computing, 2023

Drill: Log-based Anomaly Detection for Large-scale Storage Systems Using Source Code Analysis.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

FaultyRank: A Graph-based Parallel File System Checker.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Early Exploration of Using ChatGPT for Log-based Anomaly Detection on Parallel File Systems Logs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

2022
A performance study of optane persistent memory: from storage data structures' perspective.
CCF Trans. High Perform. Comput., December, 2022

A Study of Failure Recovery and Logging of High-Performance Parallel File Systems.
ACM Trans. Storage, 2022

SchedInspector: A Batch Job Scheduling Inspector Using Reinforcement Learning.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

ClusterLog: Clustering Logs for Effeftxsctive Log-based Anomaly Detection.
Proceedings of the 12th IEEE/ACM Workshop on Fault Tolerance for HPC at eXtreme Scale, 2022

VCSR: Mutable CSR Graph Format Using Vertex-Centric Packed Memory Array.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Trigger-Based Incremental Data Processing with Unified Sync and Async Model.
IEEE Trans. Cloud Comput., 2021

I/O characteristic discovery for storage system optimizations.
J. Parallel Distributed Comput., 2021

SentiLog: Anomaly Detecting on Parallel File Systems via Log-based Sentiment Analysis.
Proceedings of the HotStorage '21: 13th ACM Workshop on Hot Topics in Storage and File Systems, 2021

2020
PRS: A Pattern-Directed Replication Scheme for Heterogeneous Object-Based Storage.
IEEE Trans. Computers, 2020

RLScheduler: an automated HPC batch job scheduler using reinforcement learning.
Proceedings of the International Conference for High Performance Computing, 2020

Understand the overheads of storage data structures on persistent memory.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

2019
Managing Rich Metadata in High-Performance Computing Systems Using a Graph Model.
IEEE Trans. Parallel Distributed Syst., 2019

Client-side straggler-aware I/O scheduler for object-based parallel file systems.
Parallel Comput., 2019

Vectorizing disks blocks for efficient storage system via deep learning.
Parallel Comput., 2019

RLScheduler: Learn to Schedule HPC Batch Jobs Using Deep Reinforcement Learning.
CoRR, 2019

A Performance Study of Lustre File System Checker: Bottlenecks and Potentials.
Proceedings of the 35th Symposium on Mass Storage Systems and Technologies, 2019

A Comparative Study of Large-Scale Cluster Workload Traces via Multiview Analysis.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Modeling HPC Storage Performance Using Long Short-Term Memory Networks.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

2018
A Cross-Layer Solution in Scientific Workflow System for Tackling Data Movement Challenge.
CoRR, 2018

A Software-Defined Approach for QoS Control in High-Performance Computing Storage Systems.
CoRR, 2018

GRAM: A GPU-Based Property Graph Traversal and Query for HPC Rich Metadata Management.
Proceedings of the Network and Parallel Computing, 2018

PFault: A General Framework for Analyzing the Reliability of High-Performance Parallel File Systems.
Proceedings of the 32nd International Conference on Supercomputing, 2018

AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems.
Proceedings of the 18th IEEE/ACM International Symposium on Cluster, 2018

I/O Characteristics Discovery in Cloud Storage Systems.
Proceedings of the 11th IEEE International Conference on Cloud Computing, 2018

2017
SuperMIC: Analyzing Large Biological Datasets in Bioinformatics with Maximal Information Coefficient.
IEEE ACM Trans. Comput. Biol. Bioinform., 2017

DAAC Workshop Chairs' Welcome.
Proceedings of the Companion Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

POSTER: IOGP: An Incremental Online Graph Partitioning for Large-Scale Distributed Graph Databases.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

IOGP: An Incremental Online Graph Partitioning Algorithm for Distributed Graph Databases.
Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

Pattern-Directed Replication Scheme for Heterogeneous Object-based Storage.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Lightweight Provenance Service for High-Performance Computing.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
PUMA: From Simultaneous to Parallel for Shared Memory System in Multi-core.
J. Signal Process. Syst., 2016

An asynchronous traversal engine for graph-based rich metadata management.
Parallel Comput., 2016

A Generic Framework for Testing Parallel File Systems.
Proceedings of the 1st Joint International Workshop on Parallel Data Storage and data Intensive Scalable Computing Systems, 2016

Log-Assisted Straggler-Aware I/O Scheduler for High-End Computing.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Block2Vec: A Deep Learning Strategy on Mining Block Correlations in Storage Systems.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

GraphMeta: A Graph-Based Engine for Managing Large-Scale HPC Rich Metadata.
Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015
GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
PseudoNUMA for reducing memory interference in multi-core systems.
Proceedings of the 2014 Spring Simulation Multiconference, 2014

Unbinds data and tasks to improving the Hadoop performance.
Proceedings of the 15th IEEE/ACIS International Conference on Software Engineering, 2014

Using property graphs for rich metadata management in HPC systems.
Proceedings of the 9th Parallel Data Storage Workshop, 2014

Two-Choice Randomized Dynamic I/O Scheduler for Object Storage Systems.
Proceedings of the International Conference for High Performance Computing, 2014

PUMA: Pseudo unified memory architecture for single-ISA heterogeneous multi-core systems.
Proceedings of the 2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications, 2014

DLBS: Decentralized load balancing scheme for event-driven cloud frameworks.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

Combine thread with memory scheduling for maximizing performance in multi-core systems.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014

An Adaptive Auto-configuration Tool for Hadoop.
Proceedings of the 2014 19th International Conference on Engineering of Complex Computer Systems, 2014

Bwasw-Cloud: Efficient sequence alignment algorithm for two big data with MapReduce.
Proceedings of the Fifth International Conference on the Applications of Digital Information and Web Technologies, 2014

Temperature-Aware Scheduling Based on Dynamic Time-Slice Scaling.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2014

Domino: an incremental computing framework in cloud with eventual synchronization.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Provenance-Based Prediction Scheme for Object Storage System in HPC.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Provenance-based object storage prediction scheme for scientific big data applications.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

2013
Detecting Associations in Large Dataset on MapReduce.
Proceedings of the 12th IEEE International Conference on Trust, 2013

Coordinate Task and Memory Management for Improving Power Efficiency.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2013

Group Scheduling for Improving Both CPU and Memory Power Efficiency Simultaneously.
Proceedings of the 10th IEEE International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, 2013

HDFS+: Concurrent Writes Improvements for HDFS.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012
Phase Detection for Loop-Based Programs on Multicore Architectures.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Cloud Based Short Read Mapping Service.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Sedna: A Memory Based Key-Value Storage System for Realtime Processing in Cloud.
Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops, 2012


  Loading...