WorkloadDiff: Conditional Denoising Diffusion Probabilistic Models for Cloud Workload Prediction.
IEEE Trans. Cloud Comput., 2024
Globus service enhancements for exascale applications and facilities.
Int. J. High Perform. Comput. Appl., 2024
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors.
CoRR, 2024
Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments.
CoRR, 2024
S3LLM: Large-Scale Scientific Software Understanding with LLMs Using Source, Metadata, and Document.
Proceedings of the Computational Science - ICCS 2024, 2024
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
Model and Data Management for Machine Learning (M2ML): Integrating Instruments, Edge and HPC for Accelerated Machine Learning.
Proceedings of the IEEE International Conference on Big Data, 2024
A Distributed-GPU Deep Reinforcement Learning System for Solving Large Graph Optimization Problems.
ACM Trans. Parallel Comput., June, 2023
Rapid detection of rare events from in situ X-ray diffraction data using machine learning.
CoRR, 2023
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
Tomo2Mesh: Fast Porosity Mapping and Visualization for Synchrotron Tomography.
Proceedings of the 19th IEEE International Conference on e-Science, 2023
Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data.
Proceedings of the 47th IEEE Annual Computers, Software, and Applications Conference, 2023
Toward Ultrahigh-Resolution E3SM Land Modeling on Exascale Computers.
Comput. Sci. Eng., 2022
Designing a parallel Feel-the-Way clustering algorithm on HPC systems.
Int. J. High Perform. Comput. Appl., 2021
OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems.
CoRR, 2021
Micromobility in Smart Cities: A Closer Look at Shared Dockless E-Scooters via Big Social Data.
Proceedings of the ICC 2021, 2021
OpenGraphGym: A Parallel Reinforcement Learning Framework for Graph Optimization Problems.
Proceedings of the Computational Science - ICCS 2020, 2020
FQL: An Extensible Feature Query Language and Toolkit on Searching Software Characteristics for HPC Applications.
Proceedings of the Tools and Techniques for High Performance Computing, 2019
XScan: An Integrated Tool for Understanding Open Source Community-Based Scientific Code.
Proceedings of the Computational Science - ICCS 2019, 2019
Fault Diagnosis Based on EEMD and Key Feature Representation with Separation of Stationary and Nonstationary Signals.
Proceedings of the CAA Symposium on Fault Detection, 2019
Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling.
Parallel Process. Lett., 2018
Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements.
Proceedings of the Machine Learning on HPC Environments, 2017
suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016