We stand with Ukraine

We stand with Ukraine

Tal Ben-Nun

Orcid: 0000-0002-3657-6568

According to our database¹, Tal Ben-Nun authored at least 76 papers between 2009 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

2010

2012

2014

2016

2018

2020

2022

2024

0

5

10

1

1

6

4

3

7

7

1

3

1

3

5

7

9

3

5

3

2

1

1

1

1

1

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on orcid.org

On csauthors.net:

Bibliography

2024

Lion Cub: Minimizing Communication Overhead in Distributed Lion.

[BibT_eX]

[DOI]

Satoki Ishikawa

,

,

Brian Van Essen

,

,

CoRR, 2024

Autonomous Execution for Multi-GPU Systems: Compiler Support.

[BibT_eX]

[DOI]

Javid Baydamirli

,

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

,

Piotr Luczynski

,

Saleh Ashkboosh

,

Florian Scheidl

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Low-Depth Spatial Tree Algorithms.

[BibT_eX]

[DOI]

,

,

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Piotr Luczynski

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

Arrow Matrix Decompositions.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

Piotr Luczynski

,

Saleh Ashkboosh

,

,

Florian Scheidl

,

,

,

,

,

Torsten Hoefler

Dataset, April, 2023

Performance on HPC Platforms Is Possible Without C++.

[BibT_eX]

[DOI]

,

,

Bradford L. Chamberlain

,

Bronis R. de Supinski

,

Damian W. I. Rouson

Comput. Sci. Eng., 2023

ComPile: A Large IR Dataset from Production Sources.

[BibT_eX]

[DOI]

,

,

Konstantinos Parasyris

,

,

,

William S. Moses

,

Jose Manuel Monsalve Diaz

,

,

Johannes Doerfert

CoRR, 2023

Cached Operator Reordering: A Unified View for Fast GNN Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2023

STen: Productive and Efficient Sparsity in PyTorch.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, 2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2023

A Theory of I/O-Efficient Sparse Neural Network Inference.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

CoRR, 2023

FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Alexandros Nikolaos Ziogas

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.

[BibT_eX]

[DOI]

Roberto L. Castro

,

,

,

,

Basilio B. Fraguela

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 37th International Conference on Supercomputing, 2023

Maximum Flows in Parametric Graph Templates.

[BibT_eX]

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Proceedings of the Algorithms and Complexity - 13th International Conference, 2023

Bridging Control-Centric and Data-Centric Optimization.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

2022

Python FPGA Programming with Data-Centric Multi-Level Design.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

Tiziano De Matteis

,

,

,

,

,

Carl-Johannes Johnsen

,

Torsten Hoefler

CoRR, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecast.

[BibT_eX]

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

CoRR, 2022

Deinsum: Practically I/O Optimal Multilinear Algebra.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

CoRR, 2022

The spatial computer: A model for energy-efficient parallel computation.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

,

,

,

Piotr Luczynski

,

Torsten Hoefler

CoRR, 2022

Deinsum: Practically I/O Optimal Multi-Linear Algebra.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Boosting Performance Optimization with Interactive Data Movement Visualization.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Productive Performance Engineering for Weather and Climate Modeling with Python.

[BibT_eX]

[DOI]

,

,

Florian Deconinck

,

,

,

,

,

,

Jeremy McGibbon

,

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A data-centric optimization framework for machine learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Lifting C semantics for dataflow optimization.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

,

Grzegorz Kwasniewski

,

Johannes de Fine Licht

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping.

[BibT_eX]

[DOI]

Carl-Johannes Johnsen

,

Tiziano De Matteis

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

2021

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

,

,

Giorgi Nadiradze

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

,

Alexandra Peste

J. Mach. Learn. Res., 2021

Learning Combinatorial Node Labeling Algorithms.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Maximilian Fries

,

,

,

,

Torsten Hoefler

CoRR, 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

Lukas Gianinazzi

,

Alexandru Calotoiu

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Productivity, portability, performance: data-centric Python.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Alexandru Calotoiu

,

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

,

Alexandros Nikolaos Ziogas

,

Jens Eirik Saethre

,

André Gaillard

,

,

,

Anton Kozhevnikov

,

Joost VandeVondele

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Clairvoyant prefetching for distributed machine learning I/O.

[BibT_eX]

[DOI]

,

Roman Böhringer

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Data Movement Is All You Need: A Case Study on Optimizing Transformers.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

NPBench: a benchmarking suite for high-performance NumPy.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.

[BibT_eX]

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

,

Proceedings of the 38th International Conference on Machine Learning, 2021

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

,

Tiziano De Matteis

,

,

,

Torsten Hoefler

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

,

,

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

Torsten Hoefler

ACM Trans. Reconfigurable Technol. Syst., 2020

Groute: Asynchronous Multi-GPU Programming Model with Applications to Large-scale Graph Processing.

[BibT_eX]

[DOI]

,

,

,

ACM Trans. Parallel Comput., 2020

Deep Data Flow Analysis.

[BibT_eX]

[DOI]

,

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

CoRR, 2020

Parametric Graph Templates: Properties and Algorithms.

[BibT_eX]

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

CoRR, 2020

Deep Learning for Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2020

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

,

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.

[BibT_eX]

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

CoRR, 2020

Workflows are the New Applications: Challenges in Performance, Portability, and Productivity.

[BibT_eX]

[DOI]

,

,

Daisy S. Hollman

,

,

Chris J. Newburn

Proceedings of the IEEE/ACM International Workshop on Performance, 2020

Taming unbalanced training workloads in deep learning with partial collective operations.

[BibT_eX]

[DOI]

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Augment Your Batch: Improving Generalization Through Instance Repetition.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.

[BibT_eX]

[DOI]

,

Torsten Hoefler

ACM Comput. Surv., 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

CoRR, 2019

Predicting Weather Uncertainty with Deep Convnets.

[BibT_eX]

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.

[BibT_eX]

[DOI]

,

Berry Weinstein

,

,

,

Torsten Hoefler

,

CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

[BibT_eX]

[DOI]

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

,

Torsten Hoefler

CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.

[BibT_eX]

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

CoRR, 2019

Augment your batch: better training with larger batches.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

CoRR, 2019

Optimizing the data movement in quantum transport simulations via data-centric parallel programming.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.

[BibT_eX]

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.

[BibT_eX]

[DOI]

,

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

,

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

CoRR, 2018

Neural Code Comprehension: A Learnable Representation of Code Semantics.

[BibT_eX]

[DOI]

,

Alice Shoshana Jakobovits

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Optimizing Parallel Graph Connectivity Computation via Subgraph Sampling.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Groute: An Asynchronous Multi-GPU Programming Model for Irregular Computations.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Big data causing big (TLB) problems: taming random memory accesses on the GPU.

[BibT_eX]

[DOI]

,

,

Matthias Werner

,

,

Wolfgang Lehner

Proceedings of the 13th International Workshop on Data Management on New Hardware, 2017

2016

FFMK: A Fast and Fault-Tolerant Microkernel-Based System for Exascale Computing.

[BibT_eX]

[DOI]

Carsten Weinhold

,

Adam Lackorzynski

,

,

Martin Küttler

,

,

Hermann Härtig

,

,

,

,

,

,

Thorsten Schütt

,

,

Alexander Reinefeld

,

Matthias Lieber

,

Wolfgang E. Nagel

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Memory-Oriented Programming : A Data-Centric Programming Model for Systems with Multiple Parallel Accelerators (שער נוסף בעברית: תכנות מונחה זיכרון : מודל תכנות עבור מערכות מרובות מאיצים מקביליים.).

[BibT_eX]

[DOI]

PhD thesis, 2016

Spline-based parallel nonlinear optimization of function sequences.

[BibT_eX]

[DOI]

,

,

J. Parallel Distributed Comput., 2016

Reciprocal Grids: A Hierarchical Algorithm for Computing Solution X-ray Scattering Curves from Supramolecular Complexes at High Resolution.

[BibT_eX]

[DOI]

,

,

,

,

,

J. Chem. Inf. Model., 2016

Adaptive Work-Efficient Connected Components on the GPU.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2016

2015

Memory access patterns: the missing piece of the multi-GPU puzzle.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the International Conference for High Performance Computing, 2015

2014

MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction.

[BibT_eX]

[DOI]

,

,

,

ACM Trans. Archit. Code Optim., 2014

2010

Design and implementation of a generic resource sharing virtual time dispatcher.

[BibT_eX]

[DOI]

,

,

Dror G. Feitelson

Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference, 2010

2009

A global scheduling framework for virtualization environments.

[BibT_eX]

[DOI]

,

,

Dror G. Feitelson

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Loading...