Dingwen Tao

Orcid: 0000-0001-5422-4497

Affiliations:
  • Indiana University, Bloomington, IN, USA


According to our database1, Dingwen Tao authored at least 113 papers between 2014 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing.
Future Gener. Comput. Syst., 2025

2024
BIRD+: Design of a Lightweight Communication Compressor for Resource-Constrained Distribution Learning Platforms.
IEEE Trans. Parallel Distributed Syst., November, 2024

PVII: A pedestrian-vehicle interactive and iterative prediction framework for pedestrian's trajectory.
Appl. Intell., October, 2024

TAC+: Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations.
IEEE Trans. Parallel Distributed Syst., March, 2024

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-point Data.
Proc. VLDB Endow., February, 2024

LCP: Enhancing Scientific Data Management with Lossy Compression for Particles.
CoRR, 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM Training.
CoRR, 2024

Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework.
CoRR, 2024

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
CoRR, 2024

FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources.
CoRR, 2024

A Survey on Error-Bounded Lossy Compression for Scientific Datasets.
CoRR, 2024

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

A High-Quality Workflow for Multi-Resolution Scientific Data Reduction and Visualization.
Proceedings of the International Conference for High Performance Computing, 2024

Accelerating Communication in Deep Learning Recommendation Model Training with Dual-Level Adaptive Lossy Compression.
Proceedings of the International Conference for High Performance Computing, 2024

cuSZ-i: High-Ratio Scientific Lossy Compression on GPUs with Optimized Multi-Level Interpolation.
Proceedings of the International Conference for High Performance Computing, 2024

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data.
Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures, 2024

Concealing Compression-accelerated I/O for HPC Applications through In Situ Task Scheduling.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

Machete: An Efficient Lossy Floating-Point Compressor Designed for Time Series Databases.
Proceedings of the Data Compression Conference, 2024

MASC: A Memory-Efficient Adjoint Sensitivity Analysis through Compression Using Novel Spatiotemporal Prediction.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Understanding Mixed Precision GEMM with MPGemmFI: Insights into Fault Resilience.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023
SZ3: A Modular Framework for Composing Prediction-Based Error-Bounded Lossy Compressors.
IEEE Trans. Big Data, April, 2023

Design of a Quantization-Based DNN Delta Compression Framework for Model Snapshots and Federated Learning.
IEEE Trans. Parallel Distributed Syst., March, 2023

cuSZ-I: High-Fidelity Error-Bounded Lossy Compression for Scientific Data on GPUs.
CoRR, 2023

MPGemmFI: A Fault Injection Technique for Mixed Precision GEMM in ML Applications.
CoRR, 2023

TAC+: Drastically Optimizing Error-Bounded Lossy Compression for 3D AMR Simulations.
CoRR, 2023

MEMQSim: Highly Memory-Efficient and Modularized Quantum State-Vector Simulation.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

AMRIC: A Novel In Situ Lossy Compression Framework for Efficient I/O in Adaptive Mesh Refinement Applications.
Proceedings of the International Conference for High Performance Computing, 2023

Analyzing Impact of Data Reduction Techniques on Visualization for AMR Applications Using AMReX Framework.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Demystifying and Mitigating Cross-Layer Deficiencies of Soft Error Protection in Instruction Duplication.
Proceedings of the International Conference for High Performance Computing, 2023

Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023

Carbon Emissions of Quantum Circuit Simulation: More than You Would Think.
Proceedings of the 14th International Green and Sustainable Computing Conference, 2023

GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Optimizing Error-Bounded Lossy Compression for Scientific Data With Diverse Constraints.
IEEE Trans. Parallel Distributed Syst., 2022

MGARD+: Optimizing Multilevel Methods for Error-Bounded Scientific Data Reduction.
IEEE Trans. Computers, 2022

Speculative Container Scheduling for Deep Learning Applications in a Kubernetes Cluster.
IEEE Syst. J., 2022

Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data.
Proc. VLDB Endow., 2022

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates.
CoRR, 2022

Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations.
CoRR, 2022

SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets.
CoRR, 2022

SIMD Lossy Compression for Scientific Data.
CoRR, 2022

Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Efficient Error-Bounded Lossy Compression for CPU Architectures.
Proceedings of the 30th International Symposium on Modeling, 2022

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

CEAZ: accelerating parallel I/O via hardware-algorithm co-designed adaptive lossy compression.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator systems.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Improving Prediction-Based Lossy Compression Dramatically via Ratio-Quality Modeling.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Ultrafast Error-bounded Lossy Compression for Scientific Datasets.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

TAC: Optimizing Error-Bounded Lossy Compression for Three-Dimensional Adaptive Mesh Refinement Simulations.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

H-GCN: A Graph Convolutional Network Accelerator on Versal ACAP Architecture.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

2021
COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression.
Proc. VLDB Endow., 2021

TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs.
J. Parallel Distributed Comput., 2021

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression.
CoRR, 2021

cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs.
CoRR, 2021

Understanding Effectiveness of Multi-Error-Bounded Lossy Compression for Preserving Ranges of Interest in Scientific Analysis.
Proceedings of the 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data, 2021

An efficient uncertain graph processing framework for heterogeneous architectures.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

A novel memory-efficient deep learning training framework via error-bounded lossy compression.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

ClickTrain: efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Adaptive Configuration of In Situ Lossy Compression for Cosmology Simulations via Fine-Grained Rate-Quality Modeling.
Proceedings of the HPDC '21: The 30th International Symposium on High-Performance Parallel and Distributed Computing, 2021

Optimizing Multi-Range based Error-Bounded Lossy Compression for Scientific Datasets.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Exploring Autoencoder-based Error-bounded Compression for Scientific Data.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Characterizing Impacts of Storage Faults on HPC Applications: A Methodology and Insights.
Proceedings of the IEEE International Conference on Cluster Computing, 2021

Improving Lossy Compression for SZ by Exploring the Best-Fit Lossless Compression Techniques.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

2020
Performance Optimization for Relative-Error-Bounded Lossy Compression on Scientific Data.
IEEE Trans. Parallel Distributed Syst., 2020

An Efficient End-to-End Deep Learning Training Framework via Fine-Grained Pattern-Based Pruning.
CoRR, 2020

MGARD+: Optimizing Multi-grid Based Reduction for Efficient Scientific Data Management.
CoRR, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
CoRR, 2020

ISM2: Optimizing Irregular-Shaped Matrix-Matrix Multiplication on GPUs.
CoRR, 2020

waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Understanding GPU-Based Lossy Compression for Extreme-Scale Cosmological Simulations.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Delta-DNN: Efficiently Compressing Deep Neural Networks via Exploiting Floats Similarity.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Significantly Improving Lossy Compression for HPC Datasets with Second-Order Prediction and Parameter Optimization.
Proceedings of the HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing, 2020

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

SDRBench: Scientific Data Reduction Benchmark for Lossy Compressors.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

LCFI: A Fault Injection Tool for Studying Lossy Compression Error Propagation in HPC Programs.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP.
IEEE Trans. Parallel Distributed Syst., 2019

Efficient Lossy Compression for Scientific Data Based on Pointwise Relative Error Bound.
IEEE Trans. Parallel Distributed Syst., 2019

Z-checker: A framework for assessing lossy compression of scientific data.
Int. J. High Perform. Comput. Appl., 2019

Use cases of lossy compression for floating-point data in scientific data sets.
Int. J. High Perform. Comput. Appl., 2019

Significantly improving lossy compression quality based on an optimized hybrid prediction model.
Proceedings of the International Conference for High Performance Computing, 2019

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with Precomputation-Based Mechanisms.
Proceedings of the 35th Symposium on Mass Storage Systems and Technologies, 2019

TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs.
Proceedings of the ACM International Conference on Supercomputing, 2019

DeepSZ: A Novel Framework to Compress Deep Neural Networks by Using Error-Bounded Lossy Compression.
Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, 2019

Accelerating Lossy Compression on HPC Datasets via Partitioning Computation for Parallel Processing.
Proceedings of the 21st IEEE International Conference on High Performance Computing and Communications; 17th IEEE International Conference on Smart City; 5th IEEE International Conference on Data Science and Systems, 2019

Improving Performance of Data Dumping with Lossy Compression for Scientific Simulation.
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019

Elastic Executor Provisioning for Iterative Workloads on Apache Spark.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

Progress-based Container Scheduling for Short-lived Applications in a Kubernetes Cluster.
Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018
Fault Tolerance for Iterative Methods in High-Performance Computing.
PhD thesis, 2018

Fault tolerant one-sided matrix decompositions on heterogeneous systems with GPUs.
Proceedings of the International Conference for High Performance Computing, 2018

Improving performance of iterative methods by lossy checkponting.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

Fixed-PSNR Lossy Compression for Scientific Data.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

PaSTRI: Error-Bounded Lossy Compression for Two-Electron Integrals in Quantum Chemistry.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Error-Controlled Lossy Compression Optimized for High Compression Ratios of Scientific Datasets.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Exploration of Pattern-Matching Techniques for Lossy Compression on Cosmology Simulation Data Sets.
Proceedings of the High Performance Computing, 2017

Correcting soft errors online in fast fourier transform.
Proceedings of the International Conference for High Performance Computing, 2017

Silent Data Corruption Resilient Two-sided Matrix Factorizations.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

In-depth exploration of single-snapshot lossy compression techniques for N-body simulations.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
GreenLA: green linear algebra software for GPU-accelerated heterogeneous computing.
Proceedings of the International Conference for High Performance Computing, 2016

Towards Practical Algorithm Based Fault Tolerance in Dense Linear Algebra.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

New-Sum: A Novel Online ABFT Scheme For General Iterative Methods.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2014
Extending checksum-based ABFT to tolerate soft errors online in iterative methods.
Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems, 2014


  Loading...