Dong Li

Ignacio Laguna

J. Parallel Distributed Comput., 2021

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2021

Preface.

[BibT_eX]

[DOI]

J. Comput. Sci. Technol., 2021

ZeRO-Offload: Democratizing Billion-Scale Model Training.

[BibT_eX]

[DOI]

Samyam Rajbhandari

Reza Yazdani Aminabadi

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory.

[BibT_eX]

[DOI]

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Flame: A Self-Adaptive Auto-Labeling System for Heterogeneous Mobile Processors.

[BibT_eX]

[DOI]

Proceedings of the 6th IEEE/ACM Symposium on Edge Computing, 2021

MD-HM: memoization-based molecular dynamics simulations on big memory system.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Optimizing large-scale plasma simulations on persistent memory-based heterogeneous memory with effective data placement across memory hierarchy.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Athena: high-performance sparse tensor contraction sequence on heterogeneous memory.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Enabling energy-efficient DNN training on hybrid GPU-FPGA accelerators.

[BibT_eX]

[DOI]

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

ArchTM: Architecture-Aware, High Performance Transaction for Persistent Memory.

[BibT_eX]

[DOI]

Proceedings of the 19th USENIX Conference on File and Storage Technologies, 2021

Tahoe: tree structure-aware high performance inference engine for decision tree ensemble on GPU.

[BibT_eX]

[DOI]

Proceedings of the EuroSys '21: Sixteenth European Conference on Computer Systems, 2021

Fast, flexible, and comprehensive bug detection for persistent memory programs.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

FLAME: A Self-Adaptive Auto-labeling System for Heterogeneous Mobile Processors.

[BibT_eX]

[DOI]

CoRR, 2020

Exploration on Routing Configuration of HNoC With Intelligent On-Chip Resource Management.

[BibT_eX]

[DOI]

Juan Fang

Zeqing Chang

IEEE Access, 2020

Smart-PGSim: using neural network to accelerate AC-OPF power grid simulation.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

RIANN: Real-time Incremental Learning with Approximate Nearest Neighbor on Mobile Devices.

[BibT_eX]

[DOI]

Jiawen Liu

Zhen Xie

Proceedings of the 2020 USENIX Conference on Operational Machine Learning, 2020

HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory.

[BibT_eX]

[DOI]

Minjia Zhang

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

MATCH: An MPI Fault Tolerance Benchmark Suite.

[BibT_eX]

[DOI]

Giorgis Georgakoudis

Konstantinos Parasyris

Ignacio Laguna

Proceedings of the IEEE International Symposium on Workload Characterization, 2020

Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

Ribbon: High Performance Cache Line Flushing for Persistent Memory.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Sentinel: Runtime Data Management on Heterogeneous Main MemorySystems for Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2019

EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures.

[BibT_eX]

[DOI]

CoRR, 2019

Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2.

[BibT_eX]

[DOI]

CoRR, 2019

Architecture-Aware, High Performance Transaction for Persistent Memory.

[BibT_eX]

[DOI]

CoRR, 2019

UMap: Enabling Application-driven Optimizations for Page Management.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

Adaptive neural network-based approximation to accelerate eulerian fluid simulation.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2019

Opera: Similarity Analysis on Data Access Patterns of OpenMP Tasks to Optimize Task Affinity.

[BibT_eX]

[DOI]

Chunhua Liao

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

MOARD: Modeling Application Resilience to Transient Faults on Data Objects.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Performance Analysis and Characterization of Training Deep Learning Models on Mobile Device.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Conference on Parallel and Distributed Systems, 2019

2018

A Preliminary Study of Neural Network-based Approximation for HPC Applications.

[BibT_eX]

[DOI]

Wenqian Dong

CoRR, 2018

Characterization and Comparison of Application Resilience for Serial and Parallel Executions.

[BibT_eX]

[DOI]

CoRR, 2018

Runtime data management on non-volatile memory-based heterogeneous memory for task-parallel programs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Understanding Application Recomputability without Crash Consistency in Non-Volatile Memory.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

FlipTracker: understanding natural error resilience in HPC applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Processing-in-Memory for Energy-Efficient Neural Network Training: A Heterogeneous Approach.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Modeling Application Resilience in Large-scale Parallel Execution.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

GMOD: a dynamic GPU memory overflow detector.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Optimizing Data Placement on GPU Memory: A Portable Approach.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2017

Early Evaluation of Intel Optane Non-Volatile Memory with HPC I/O Workloads.

[BibT_eX]

[DOI]

CoRR, 2017

Unimem: Runtime Data Management on Non-Volatile Memory-based Heterogeneous Main Memory.

[BibT_eX]

[DOI]

Yingchao Huang

CoRR, 2017

High Performance Data Persistence in Non-Volatile Memory for Resilient High Performance Computing.

[BibT_eX]

[DOI]

Yingchao Huang

CoRR, 2017

Application-Level Resilience Modeling for HPC Fault Tolerance.

[BibT_eX]

[DOI]

Hanlin He

CoRR, 2017

Unimem: runtime data managementon non-volatile memory-based heterogeneous main memory.

[BibT_eX]

[DOI]

Yingchao Huang

Proceedings of the International Conference for High Performance Computing, 2017

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory.

[BibT_eX]

[DOI]

Proceedings of the 2017 International Conference on Networking, Architecture, and Storage, 2017

Exploring Synchronization in Cache Coherent Manycore Systems: A Case Study with Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems, 2017

Algorithm-Directed Crash Consistence in Non-volatile Memory for HPC.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

Integrated Thermal Analysis for Processing In Die-Stacking Memory.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

Performance Implications of Processing-in-Memory Designs on Data-Intensive Applications.

[BibT_eX]

[DOI]

Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015

A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches.

[BibT_eX]

[DOI]

Manjunath Gorentla Venkata

IEEE Trans. Parallel Distributed Syst., 2015

Enabling Portable Optimizations of Data Placement on GPU.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

Fast Fault Injection and Sensitivity Analysis for Collective Communications.

[BibT_eX]

[DOI]

Kun Feng

Xian-He Sun

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

Quantitatively Modeling Application Resilience with the Data Vulnerability Factor.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Application characterization using Oxbow toolkit and PADS infrastructure.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing.

[BibT_eX]

[DOI]

Seyong Lee

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2014

Improving energy efficiency of embedded DRAM caches for high-end computing systems.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

WriteSmoothing: improving lifetime of non-volatile caches using intra-set wear-leveling.

[BibT_eX]

[DOI]

Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Strategies for Energy-Efficient Resource Management of Hybrid Programming Models.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2013

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

HPPAC Introduction.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

A2E: Adaptively aggressive energy efficient DVFS scheduling for data intensive applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE 32nd International Performance Computing and Communications Conference, 2013

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection.

[BibT_eX]

[DOI]

Seyong Lee

Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Improving performance and energy efficiency of matrix multiplication via pipeline broadcast.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Critical path-based thread placement for NUMA systems.

[BibT_eX]

[DOI]

Chun-Yi Su

Matthew Grove

SIGMETRICS Perform. Evaluation Rev., 2012

Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool.

[BibT_eX]

[DOI]

Weikuan Yu

Proceedings of the SC Conference on High Performance Computing Networking, 2012

PCM-Based Durable Write Cache for Fast Disk I/O.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Model-based, memory-centric performance and power optimization on NUMA multiprocessors.

[BibT_eX]

[DOI]

Chun-Yi Su

Edgar A. León

Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011

Scalable and Energy Efficient Execution Methods for Multicore Systems.

[BibT_eX]

[DOI]

PhD thesis, 2011

Scalable memory registration for high performance networks using helper threads.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

2010

PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2010

Power saving experiments for large-scale global optimisation.

[BibT_eX]

[DOI]

Int. J. Parallel Emergent Distributed Syst., 2010

Hybrid MPI/OpenMP power-aware computing.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Power-aware MPI task aggregation prediction for high-end computing systems.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

System-Level, Unified In-band and Out-of-band Dynamic Thermal Control.

[BibT_eX]

[DOI]

Rong Ge

Proceedings of the 39th International Conference on Parallel Processing, 2010

2008

System-level, thermal-aware, fully-loaded process scheduling.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

CG-Cell: An NPB Benchmark Implementation on Cell Broadband Engine.

[BibT_eX]

[DOI]

Song Huang