2024
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors.
CoRR, 2024
A Survey on Error-Bounded Lossy Compression for Scientific Datasets.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024
hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2024
POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU Clusters.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
An Optimized Error-controlled MPI Collective Framework Integrated with Lossy Compression.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
A Portable, Fast, DCT-based Compressor for AI Accelerators.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
2023
gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
C-Coll: Introducing Error-bounded Lossy Compression into MPI Collectives.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
cuSZp: An Ultra-fast GPU Error-bounded Lossy Compression Framework with Optimized End-to-End Performance.
Proceedings of the International Conference for High Performance Computing, 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023
Lightweight Huffman Coding for Efficient GPU Compression.
Proceedings of the 37th International Conference on Supercomputing, 2023
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs.
Proceedings of the 37th International Conference on Supercomputing, 2023
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs.
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023
2022
SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates.
CoRR, 2022
SZx: an Ultra-fast Error-bounded Lossy Compressor for Scientific Datasets.
CoRR, 2022
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Ultrafast Error-bounded Lossy Compression for Scientific Datasets.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022
2021
Scalable and accurate multi-GPU based image reconstruction of large-scale ptychography data.
CoRR, 2021
cuSZ(x): Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs.
CoRR, 2021
High-Performance Ptychographic Reconstruction with Federated Facilities.
Proceedings of the Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation, 2021
Topology-aware optimizations for multi-GPU ptychographic image reconstruction.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions.
Proceedings of the IEEE International Conference on Cluster Computing, 2021
Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs.
Proceedings of the IEEE International Conference on Cluster Computing, 2021
2020
GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
2019
GPU-Based Iterative Medical CT Image Reconstructions.
J. Signal Process. Syst., 2019
Comparative Measurement of Cache Configurations' Impacts on Cache Timing Side-Channel Attacks.
Proceedings of the 12th USENIX Workshop on Cyber Security Experimentation and Test, 2019
2018
Novel meshes for multivariate interpolation and approximation.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the ACMSE 2018 Conference, Richmond, KY, USA, March 29-31, 2018, 2018
2017
A framework for fast and fair evaluation of automata processing hardware.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017
Demystifying automata processing: GPUs, FPGAs or Micron's AP?
Proceedings of the International Conference on Supercomputing, 2017
An Enhanced Image Reconstruction Tool for Computed Tomography on CPUs.
Proceedings of the Computing Frontiers Conference, 2017
Robotomata: A framework for approximate pattern matching of big data on an automata processor.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
2016
cuART: Fine-Grained Algebraic Reconstruction Technique for Computed Tomography Images on GPUs.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016
O3FA: A Scalable Finite Automata-based Pattern-Matching Engine for Out-of-Order Deep Packet Inspection.
Proceedings of the 2016 Symposium on Architectures for Networking and Communications Systems, 2016
2014
Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence.
IEEE J. Sel. Areas Commun., 2014
2013
Exploring different automata representations for efficient regular expression matching on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013
GPU acceleration of regular expression matching for large datasets: exploring the implementation space.
Proceedings of the Computing Frontiers Conference, 2013