Enabling Homomorphically Encrypted Inference for Large DNN Models.
IEEE Trans. Computers, 2022
cuConv: CUDA implementation of convolution for CNN inference.
Clust. Comput., 2022
ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs.
IEEE Access, 2019
Efficient exception handling support for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Direct Inter-Process Communication (dIPC): Repurposing the CODOMs Architecture to Accelerate IPC.
Proceedings of the Twelfth European Conference on Computer Systems, 2017
GPU-SM: shared memory multi-GPU programming.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015
Comparison based sorting for systems with multiple GPUs.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013