Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN.
CoRR, 2024
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability.
Proceedings of the International Conference for High Performance Computing, 2024
Optimizing Hyperplane Sweep Operations Using Asynchronous Multi-grain GPU Tasks.
Proceedings of the IEEE International Symposium on Workload Characterization, 2019
Adaptive Task Aggregation for High-Performance Sparse Solvers on GPUs.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
Investigating Data Layout Transformations in Chapel.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Taming irregular applications via advanced dynamic parallelism on GPUs.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018
Characterizing data organization effects on heterogeneous memory architectures.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017
MPI-ACC: Accelerator-Aware MPI for Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2016
MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL.
Parallel Comput., 2016
Implementing directed acyclic graphs with the heterogeneous system architecture.
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016
Programming High-Performance Clusters with Heterogeneous Computing Devices.
PhD thesis, 2015
Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015
Synchronization and Ordering Semantics in Hybrid MPI+GPU Programming.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Online Performance Projection for Clusters with Heterogeneous GPUs.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013
pVOCL: Power-Aware Dynamic Placement and Migration in Virtualized GPU Environments.
Proceedings of the IEEE 33rd International Conference on Distributed Computing Systems, 2013
On the efficacy of GPU-integrated MPI for scientific applications.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013
Contagion Diffusion with EpiSimdemics.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013
Efficient Intranode Communication in GPU-Accelerated Systems.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
Simulating the Spread of Infectious Disease over Large Realistic Social Networks Using Charm++.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012
DMA-Assisted, Intranode Communication in GPU Accelerated Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012
MPI-ACC: An Integrated and Extensible Approach to Data Movement in Accelerator-based Systems.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012
Poster: large-scale computational epidemiology modeling using charm++.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011
High-performance biocomputing for simulating the spread of contagion over large contact networks.
Proceedings of the IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences, 2011
Bounding the effect of partition camping in GPU kernels.
Proceedings of the 8th Conference on Computing Frontiers, 2011
GPU-RMAP: Accelerating Short-Read Mapping on Graphics Processors.
Proceedings of the 13th IEEE International Conference on Computational Science and Engineering, 2010
On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009
Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine.
Proceedings of the 5th Conference on Computing Frontiers, 2008
Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations.
Proceedings of the 8th IEEE International Conference on Bioinformatics and Bioengineering, 2008