Dhiraj D. Kalamkar
According to our database1,
Dhiraj D. Kalamkar
authored at least 31 papers
between 2007 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures.
CoRR, 2023
2022
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning and HPC Workloads.
Frontiers Appl. Math. Stat., 2022
Accelerating Deep Learning based Identification of Chromatin Accessibility from noisy ATAC-seq Data.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
2021
Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads.
CoRR, 2021
Proceedings of the International Conference for High Performance Computing, 2021
Tensor processing primitives: a programming abstraction for efficiency and portability in deep learning workloads.
Proceedings of the International Conference for High Performance Computing, 2021
2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
2019
Supercomput. Front. Innov., 2019
Proceedings of the 2019 IEEE International Conference on Cluster Computing, 2019
2018
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 6th International Conference on Learning Representations, 2018
2016
Optimizations in a high-performance conjugate gradient benchmark for IA-based multi- and many-core processors.
Int. J. High Perform. Comput. Appl., 2016
Proceedings of the High Performance Computing, 2016
2015
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading.
Proceedings of the International Conference for High Performance Computing, 2015
2014
Enabling Efficient Multithreaded MPI Communication through a Library-Based Implementation of MPI Endpoints.
Proceedings of the International Conference for High Performance Computing, 2014
Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices.
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Improving Communication Performance and Scalability of Native Applications on Intel Xeon Phi Coprocessor Clusters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
2013
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013
2012
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Analysis and Optimization of Financial Analytics Benchmark on Modern Multi- and Many-core IA-Based Architectures.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
2007
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007