Automatic Tracing in Task-Based Runtime Systems.
CoRR, 2024
Composing Distributed Computations Through Task and Kernel Fusion.
CoRR, 2024
Legate Sparse: Distributed Sparse Computing in Python.
Proceedings of the International Conference for High Performance Computing, 2023
Visibility Algorithms for Dynamic Dependence Analysis and Distributed Coherence.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Supercomputing in Python With Legate.
Comput. Sci. Eng., 2021
Index launches: scalable, flexible representation of parallel task groups.
Proceedings of the International Conference for High Performance Computing, 2021
Scaling implicit parallelism via dynamic control replication.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Legate NumPy: accelerated and distributed array computing.
Proceedings of the International Conference for High Performance Computing, 2019
Dynamic tracing: memoization of task graphs for dynamic task-based runtimes.
Proceedings of the International Conference for High Performance Computing, 2018
Control replication: compiling implicit parallelism to efficient SPMD with logical regions.
Proceedings of the International Conference for High Performance Computing, 2017
Integrating External Resources with a Task-Based Programming Model.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016
Regent: a high-productivity programming language for HPC with logical regions.
Proceedings of the International Conference for High Performance Computing, 2015
Verification of producer-consumer synchronization in GPU programs.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015
Legion: programming distributed heterogeneous architectures with logical regions.
PhD thesis, 2014
Structure Slicing: Extending Logical Regions with Fields.
Proceedings of the International Conference for High Performance Computing, 2014
Singe: leveraging warp specialization for high performance on GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Realm: an event-based low-level runtime for distributed memory architectures.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
Language support for dynamic, hierarchical data partitioning.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013
Legion: expressing locality and independence with logical regions.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
CudaDMA: optimizing GPU memory bandwidth via warp specialization.
Proceedings of the Conference on High Performance Computing Networking, 2011
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011