Dong Li

Orcid: 0000-0001-9336-0694

Affiliations:
  • University of California, Merced, CA, USA
  • Oak Ridge National Laboratory, TN, USA (former)
  • Virginia Tech, Department of Computer Science, Blacksburg, VA, USA (former)


According to our database1, Dong Li authored at least 110 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Exploring and Evaluating Real-world CXL: Use Cases and System Adoption.
CoRR, 2024

FlexMem: Adaptive Page Profiling and Migration for Tiered Memory.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

Performance Study of CXL Memory Topology.
Proceedings of the International Symposium on Memory Systems, 2024

Enabling Large Dynamic Neural Network Training with Learning-based Memory Management.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2024

MTM: Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

2023
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation.
Parallel Comput., 2023

iQAN: Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
IEEE Data Eng. Bull., 2023

HM-Keeper: Scalable Page Management for Multi-Tiered Large Memory Systems.
CoRR, 2023

Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
TLB-pilot: Mitigating TLB Contention Attack on GPUs with Microarchitecture-Aware Scheduling.
ACM Trans. Archit. Code Optim., 2022

Campo: Cost-Aware Performance Optimization for Mixed-Precision Neural Network Training.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

LB-HM: load balance-aware data placement on heterogeneous memory for task-parallel HPC applications.
Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Lobster: Load Balance-Aware I/O for Distributed DNN Training.
Proceedings of the 51st International Conference on Parallel Processing, 2022

Large Scale Caching and Streaming of Training Data for Online Deep Learning.
Proceedings of the FlexScience '22: Proceedings of the 12th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2022

2021
Trust: Triangle Counting Reloaded on GPUs.
IEEE Trans. Parallel Distributed Syst., 2021

Efficient Buffer Overflow Detection on GPU.
IEEE Trans. Parallel Distributed Syst., 2021

Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation.
Proc. VLDB Endow., 2021

PARIS: Predicting application resilience using machine learning.
J. Parallel Distributed Comput., 2021

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing.
J. Comput. Sci. Technol., 2021

Preface.
J. Comput. Sci. Technol., 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.
Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors.
Proceedings of the 6th IEEE/ACM Symposium on Edge Computing, 2021

MD-HM: memoization-based molecular dynamics simulations on big memory system.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Athena: high-performance sparse tensor contraction sequence on heterogeneous memory.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

ArchTM: Architecture-Aware, High Performance Transaction for Persistent Memory.
Proceedings of the 19th USENIX Conference on File and Storage Technologies, 2021

Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU.
Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

Fast, flexible, and comprehensive bug detection for persistent memory programs.
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020
FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors.
CoRR, 2020

Exploration on Routing Configuration of HNoC With Intelligent On-Chip Resource Management.
IEEE Access, 2020

Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation.
Proceedings of the International Conference for High Performance Computing, 2020

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices.
Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

MATCH: An MPI Fault Tolerance Benchmark Suite.
Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

Ribbon: High Performance Cache Line Flushing for Persistent Memory.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning.
CoRR, 2019

EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures.
CoRR, 2019

Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2.
CoRR, 2019

Architecture-Aware, High Performance Transaction for Persistent Memory.
CoRR, 2019

UMap: Enabling Application-driven Optimizations for Page Management.
Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

Adaptive neural network-based approximation to accelerate eulerian fluid simulation.
Proceedings of the International Conference for High Performance Computing, 2019

Opera: Similarity Analysis on Data Access Patterns of OpenMP Tasks to Optimize Task Affinity.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

MOARD: Modeling Application Resilience to Transient Faults on Data Objects.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device.
Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

2018
A Preliminary Study of Neural Network-based Approximation for HPC Applications.
CoRR, 2018

Characterization and Comparison of Application Resilience for Serial and Parallel Executions.
CoRR, 2018

Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs.
Proceedings of the International Conference for High Performance Computing, 2018

Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory.
Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

FlipTracker: understanding natural error resilience in HPC applications.
Proceedings of the International Conference for High Performance Computing, 2018

Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Modeling Application Resilience in Large-scale Parallel Execution.
Proceedings of the 47th International Conference on Parallel Processing, 2018

GMOD: a dynamic GPU memory overflow detector.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Optimizing Data Placement on GPU Memory: A Portable Approach.
IEEE Trans. Computers, 2017

Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads.
CoRR, 2017

Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory.
CoRR, 2017

High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing.
CoRR, 2017

Application-Level Resilience Modeling for HPC Fault Tolerance.
CoRR, 2017

Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory.
Proceedings of the International Conference for High Performance Computing, 2017

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory.
Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

Exploring Synchronization in Cache Coherent Manycore Systems: A Case Study with Xeon Phi.
Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Algorithm-Directed Crash Consistence in Non-volatile Memory for HPC.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016
Integrated Thermal Analysis for Processing In Die-Stacking Memory.
Proceedings of the Second International Symposium on Memory Systems, 2016

Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches.
IEEE Trans. Parallel Distributed Syst., 2015

Enabling Portable Optimizations of Data Placement on GPU.
IEEE Micro, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Fast Fault Injection and Sensitivity Analysis for Collective Communications.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014
Quantitatively Modeling Application Resilience with the Data Vulnerability Factor.
Proceedings of the International Conference for High Performance Computing, 2014

Application characterization using Oxbow toolkit and PADS infrastructure.
Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON.
Proceedings of the International Conference on Computational Science, 2014

Improving energy efficiency of embedded DRAM caches for high-end computing systems.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

WriteSmoothing: improving lifetime of non-volatile caches using intra-set wear-leveling.
Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.
IEEE Trans. Parallel Distributed Syst., 2013

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach.
Proceedings of the International Conference for High Performance Computing, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

HPPAC Introduction.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications.
Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Improving performance and energy efficiency of matrix multiplication via pipeline broadcast.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Critical path-based thread placement for NUMA systems.
SIGMETRICS Perform. Evaluation Rev., 2012

Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

PCM-Based Durable Write Cache for Fast Disk I/O.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Model-based, memory-centric performance and power optimization on NUMA multiprocessors.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Scalable and Energy Efficient Execution Methods for Multicore Systems.
PhD thesis, 2011

Scalable memory registration for high performance networks using helper threads.
Proceedings of the 8th Conference on Computing Frontiers, 2011

2010
PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications.
IEEE Trans. Parallel Distributed Syst., 2010

Power saving experiments for large-scale global optimisation.
Int. J. Parallel Emergent Distributed Syst., 2010

Hybrid MPI/OpenMP power-aware computing.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

System-Level, Unified In-band and Out-of-band Dynamic Thermal Control.
Proceedings of the 39th International Conference on Parallel Processing, 2010

2008
System-level, thermal-aware, fully-loaded process scheduling.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

CG-Cell: An NPB Benchmark Implementation on Cell Broadband Engine.
Proceedings of the Distributed Computing and Networking, 9th International Conference, 2008


  Loading...