Tal Ben-Nun
Orcid: 0000-0002-3657-6568
According to our database1,
Tal Ben-Nun
authored at least 76 papers
between 2009 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
2023
Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.
CoRR, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the Algorithms and Complexity - 13th International Conference, 2023
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023
2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022
2021
Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.
IEEE Trans. Parallel Distributed Syst., 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.
J. Mach. Learn. Res., 2021
Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
Proceedings of the International Conference for High Performance Computing, 2021
On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the International Conference for High Performance Computing, 2021
On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021
ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.
Proceedings of the 38th International Conference on Machine Learning, 2021
StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021
2020
ACM Trans. Reconfigurable Technol. Syst., 2020
Groute: Asynchronous Multi-GPU Programming Model with Applications to Large-scale Graph Processing.
ACM Trans. Parallel Comput., 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.
CoRR, 2020
CoRR, 2020
Workflows are the New Applications: Challenges in Performance, Portability, and Productivity.
Proceedings of the IEEE/ACM International Workshop on Performance, 2020
Taming unbalanced training workloads in deep learning with partial collective operations.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020
2019
Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.
ACM Comput. Surv., 2019
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.
CoRR, 2019
Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.
CoRR, 2019
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.
CoRR, 2019
Optimizing the data movement in quantum transport simulations via data-centric parallel programming.
Proceedings of the International Conference for High Performance Computing, 2019
A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.
Proceedings of the International Conference for High Performance Computing, 2019
Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.
Proceedings of the International Conference for High Performance Computing, 2019
A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019
2018
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Proceedings of the IEEE International Conference on Cluster Computing, 2018
2017
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Proceedings of the 13th International Workshop on Data Management on New Hardware, 2017
2016
Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016
Memory-Oriented Programming : A Data-Centric Programming Model for Systems with Multiple Parallel Accelerators (שער נוסף בעברית: תכנות מונחה זיכרון : מודל תכנות עבור מערכות מרובות מאיצים מקביליים.).
PhD thesis, 2016
J. Parallel Distributed Comput., 2016
Reciprocal Grids: A Hierarchical Algorithm for Computing Solution X-ray Scattering Curves from Supramolecular Complexes at High Resolution.
J. Chem. Inf. Model., 2016
2015
Proceedings of the International Conference for High Performance Computing, 2015
2014
MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction.
ACM Trans. Archit. Code Optim., 2014
2010
Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference, 2010
2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009