Bogdan Nicolae

Orcid: 0000-0002-0661-7509

According to our database1, Bogdan Nicolae authored at least 103 papers between 2008 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Efficient distributed continual learning for steering experiments in real-time.
Future Gener. Comput. Syst., 2025

2024
Scalable I/O aggregation for asynchronous multi-level checkpointing.
Future Gener. Comput. Syst., 2024

Wilkins: HPC In Situ Workflows Made Easy.
CoRR, 2024

Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results.
Proceedings of the 25th International Middleware Conference, 2024

Deep Optimizer States: Towards Scalable Training of Transformer Models using Interleaved Offloading.
Proceedings of the 25th International Middleware Conference, 2024

Viper: A High-Performance I/O Framework for Transparently Updating, Storing, and Transferring Deep Neural Network Models.
Proceedings of the 53rd International Conference on Parallel Processing, 2024

EvoStore: Towards Scalable Storage of Evolving Learning Models.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers.
Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2024

Diaspora: Resilience-Enabling Services for Real-Time Distributed Workflows.
Proceedings of the 20th IEEE International Conference on e-Science, 2024

Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023
Understanding Patterns of Deep Learning ModelEvolution in Network Architecture Search.
CoRR, 2023

Elastic deep learning through resilient collective operations.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Modeling Multi-Threaded Aggregated I/O for Asynchronous Checkpointing on HPC Systems.
Proceedings of the 22nd International Symposium on Parallel and Distributed Computing, 2023

LowFive: In Situ Data Transport for High-Performance Workflows.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

DStore: A Lightweight Scalable Learning Model Repository with Fine-Grain Tensor-Level Access.
Proceedings of the 37th International Conference on Supercomputing, 2023

Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication.
Proceedings of the 52nd International Conference on Parallel Processing, 2023

GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

Understanding Patterns of Deep Learning Model Evolution in Network Architecture Search.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

Towards Efficient I/O Pipelines Using Accumulated Compression.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

MPIGDB: A Flexible Debugging Infrastructure for MPI Programs.
Proceedings of the 13th Workshop on AI and Scientific Computing at Scale using Flexible Computing, 2023

Building the I (Interoperability) of FAIR for Performance Reproducibility of Large-Scale Composable Workflows in RECUP.
Proceedings of the 19th IEEE International Conference on e-Science, 2023

2022

Scalable Multi-Versioning Ordered Key-Value Stores with Persistent Memory Support.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Lobster: Load Balance-Aware I/O for Distributed DNN Training.
Proceedings of the 51st International Conference on Parallel Processing, 2022

FlexScience'22: 12th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

Towards Efficient Cache Allocation for High-Frequency Checkpointing.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Large Scale Caching and Streaming of Training Data for Online Deep Learning.
Proceedings of the FlexScience '22: Proceedings of the 12th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2022

Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

Towards Low-Overhead Resilience for Data Parallel Deep Learning.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Demystifying asynchronous I/O Interference in HPC applications.
Int. J. High Perform. Comput. Appl., 2021

Towards Aggregated Asynchronous Checkpointing.
CoRR, 2021

VELOC: VEry Low Overhead Checkpointing in the Age of Exascale.
CoRR, 2021

Dynamic Heterogeneous Task Specification and Execution for In Situ Workflows.
Proceedings of the 2021 IEEE Workshop on Workflows in Support of Large-Scale Science (WORKS), 2021

Braid-DB: Toward AI-Driven Science with Machine Learning Provenance.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

High-Performance Ptychographic Reconstruction with Federated Facilities.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021

Towards Efficient I/O Scheduling for Collaborative Multi-Level Checkpointing.
Proceedings of the 29th International Symposium on Modeling, 2021

Towards High Performance Resilience Using Performance Portable Abstractions.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

Virtual Log-Structured Storage for High-Performance Streaming.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Accelerating DNN Architecture Search at Scale Using Selective Weight Transfer.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Shared-Memory Communication for Containerized Workflows.
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021

2020
DataStates: Towards Lightweight Data Models for Deep Learning.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

High-bypass Learning: Automated Detection of Tumor Cells That Significantly Impact Drug Response.
Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020

Optimizing Asynchronous Multi-Level Checkpoint/Restart Configurations with Machine Learning.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

CoSim: A Simulator for Co-Scheduling of Batch and On-Demand Jobs in HPC Datacenters.
Proceedings of the 24th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications, 2020

DeepClone: Lightweight State Replication of Deep Learning Models for Data Parallel Training.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

DeepFreeze: Towards Scalable Asynchronous Checkpointing of Deep Learning Models.
Proceedings of the 20th IEEE/ACM International Symposium on Cluster, 2020

2019
Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY.
IEEE Access, 2019

Significantly improving lossy compression quality based on an optimized hybrid prediction model.
Proceedings of the International Conference for High Performance Computing, 2019

Understanding Scalability and Fine-Grain Parallelism of Synchronous Data Parallel Training.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019

VeloC: Towards High Performance Adaptive Asynchronous Checkpointing at Large Scale.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Session details: Session 1: Converged Computing Infrastructures.
Proceedings of the 10th Workshop on Scientific Cloud Computing, 2019

Towards Portable Online Prediction of Network Utilization Using MPI-Level Monitoring.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

2018
Performance Model of MapReduce Iterative Applications for Hybrid Cloud Bursting.
IEEE Trans. Parallel Distributed Syst., 2018

KerA: Scalable Data Ingestion for Stream Processing.
Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, 2018

Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models.
Proceedings of the 5th IEEE/ACM International Conference on Big Data Computing Applications and Technologies, 2018

2017
Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics.
IEEE Trans. Parallel Distributed Syst., 2017

Exploring Shared State in Key-Value Store for Window-Based Multi-Pattern Streaming Analytics.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Evaluation of Data Locality Strategies for Hybrid Cloud Bursting of Iterative MapReduce.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Towards a unified storage and ingestion architecture for stream processing.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
Guest Editors Introduction: Special Issue on Scientific Cloud Computing.
IEEE Trans. Cloud Comput., 2016

Towards scalable on-demand collective data access in IaaS clouds: An adaptive collaborative content exchange proposal.
J. Parallel Distributed Comput., 2016

Data Multiverse: The Uncertainty Challenge of Future Big Data Analytics.
Proceedings of the Semantic Keyword-Based Search on Structured Data Sources, 2016

Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

On exploiting data locality for iterative mapreduce applications in hybrid clouds.
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, 2016

2015
Towards Transparent Throughput Elasticity for IaaS Cloud Storage: Exploring the Benefits of Adaptive Block-Level Caching.
Int. J. Distributed Syst. Technol., 2015

Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce.
Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing, 2015

Leveraging Naturally Distributed Data Redundancy to Reduce Collective I/O Replication Overhead.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Towards efficient on-demand VM provisioning: Study of VM runtime I/O access patterns to shared image content.
Proceedings of the IFIP/IEEE International Symposium on Integrated Network Management, 2015

Techniques to improve the scalability of collective checkpointing at large scale.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015

Discovering and Leveraging Content Similarity to Optimize Collective on-Demand Data Access to IaaS Cloud Storage.
Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014
Transparent Throughput Elasticity for IaaS Cloud Storage Using Guest-Side Block-Level Caching.
Proceedings of the 7th IEEE/ACM International Conference on Utility and Cloud Computing, 2014

To overlap or not to overlap: optimizing incremental MapReduce computations for on-demand data upload.
Proceedings of the 5th International Workshop on Data-Intensive Computing in the Clouds, 2014

Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Next Generation HPC Clouds: A View for Large-Scale Scientific and Data-Intensive Applications.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
BlobCR: Virtual disk based checkpoint-restart for HPC applications on IaaS clouds.
J. Parallel Distributed Comput., 2013

Scalable data management for map-reduce-based data-intensive applications: a view for cloud and hybrid infrastructures.
Int. J. Cloud Comput., 2013

Towards Scalable Checkpoint Restart: A Collective Inline Memory Contents Deduplication Proposal.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Leveraging Collaborative Content Exchange for On-Demand VM Multi-deployments in IaaS Clouds.
Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Understanding Vertical Scalability of I/O Virtualization for MapReduce Workloads: Challenges and Opportunities.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

2012
Towards scalable array-oriented active storage: the pyramid approach.
ACM SIGOPS Oper. Syst. Rev., 2012

A hybrid local storage transfer scheme for live migration of I/O intensive workloads.
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, 2012

Scalable Reed-Solomon-Based Reliable Local Storage for HPC Applications on IaaS Clouds.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage.
Trans. Large Scale Data Knowl. Centered Syst., 2011

BlobSeer: Next-generation data management for large scale infrastructures.
J. Parallel Distributed Comput., 2011

BlobCR: efficient checkpoint-restart for HPC applications on IaaS clouds using virtual disk image snapshots.
Proceedings of the Conference on High Performance Computing Networking, 2011

Going back and forth: efficient multideployment and multisnapshotting on clouds.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Optimizing Multi-deployment on Clouds by Means of Self-adaptive Prefetching.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Efficient Support for MPI-I/O Atomicity Based on Versioning.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
BlobSeer: Towards efficient data storage management for large-scale, distributed systems.
PhD thesis, 2010

BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

BlobSeer: Efficient data management for data-intensive applications distributed at large-scale.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

High Throughput Data-Compression for Cloud Storage.
Proceedings of the Data Management in Grid and Peer-to-Peer Systmes, 2010

Using Global Behavior Modeling to Improve QoS in Cloud Data Storage Services.
Proceedings of the Cloud Computing, Second International Conference, 2010

2009
Towards a Grid File System Based on a Large-Scale BLOB Management Service.
Proceedings of the Grids, 2009

Enabling High Data Throughput in Desktop Grids through Decentralized Data and Metadata Management: The BlobSeer Approach.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

BlobSeer: how to enable efficient versioning for large object storage under heavy access concurrency.
Proceedings of the 2009 EDBT/ICDT Workshops, Saint-Petersburg, Russia, March 22, 2009, 2009

2008
Distributed Management of Massive Data: An Efficient Fine-Grain Data Access Scheme.
Proceedings of the High Performance Computing for Computational Science, 2008

Enabling lock-free concurrent fine-grain access to massive distributed data: Application to supernovae detection.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008


  Loading...