We stand with Ukraine

We stand with Ukraine

Torsten Hoefler

Orcid: 0000-0002-1333-9797

Affiliations:

ETH Zürich

According to our database¹, Torsten Hoefler authored at least 420 papers between 2005 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Awards

ACM Fellow

ACM Fellow 2022, "For foundational contributions to High-Performance Computing and the application of HPC techniques to machine learning".

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2024

AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., August, 2024

Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis.

[BibT_eX]

[DOI]

,

Torsten Hoefler

IEEE Trans. Pattern Anal. Mach. Intell., May, 2024

Canary: Congestion-aware in-network allreduce using dynamic trees.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Edgar Costa Molero

,

Salvatore Di Girolamo

,

Laurent Vanbever

,

Torsten Hoefler

Future Gener. Comput. Syst., March, 2024

Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

,

,

Michal Podstawski

,

Claude Barthels

,

,

Torsten Hoefler

ACM Comput. Surv., February, 2024

A High-Performance, Energy-Efficient Modular DMA Engine Architecture.

[BibT_eX]

[DOI]

,

Michael Rogenmoser

,

,

,

Alessandro Ottaviano

,

,

Torsten Hoefler

,

IEEE Trans. Computers, January, 2024

Digital twins of Earth and the computing challenge of human interaction.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

,

Wilco Hazeleger

Nat. Comput. Sci., 2024

RED-SEA Project: Towards a new-generation European interconnect.

[BibT_eX]

[DOI]

María Engracia Gómez

,

Julio Sahuquillo

,

Andrea Biagioni

,

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Carlotta Chiarini

,

,

Fabrizio Capuani

,

Adrián Castelló

,

,

Eugenio Stabile

,

Enrique S. Quintana-Ortí

,

Pascale Bernier-Bruna

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Manolis Katevenis

,

Sokratis Bartzis

,

Orestis Mousouros

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

Miguel Sánchez de la Rosa

,

Gabriel Gomez-Lopez

,

Francisco Alfaro-Cortés

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

Microprocess. Microsystems, 2024

All models are wrong, some are useful: Model Selection with Limited Labels.

[BibT_eX]

[DOI]

Patrik Okanovic

,

,

,

Torsten Hoefler

,

,

Nezihe Merve Gürel

CoRR, 2024

Fortify Your Foundations: Practical Privacy and Security for Foundation Model Deployments In The Cloud.

[BibT_eX]

[DOI]

,

Anjo Vahldiek-Oberwagner

,

Marcin Spoczynski

,

Scott Constable

,

,

Torsten Hoefler

CoRR, 2024

SeBS-Flow: Benchmarking Serverless Cloud Function Workflows.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

Laurin Brandner

,

,

Torsten Hoefler

CoRR, 2024

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Lorenzo Pichetti

,

,

Tiziano De Matteis

,

,

,

Matteo Turisini

,

Daniele Cesarini

,

,

Animesh Trivedi

,

,

,

Salvatore Di Girolamo

,

Torsten Hoefler

CoRR, 2024

Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AI.

[BibT_eX]

[DOI]

Mikhail Khalilov

,

Salvatore Di Girolamo

,

,

,

,

Torsten Hoefler

CoRR, 2024

Hardware Acceleration for Knowledge Graph Processing: Challenges & Recent Developments.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

,

Pournima Sonawane

,

Juan Gómez-Luna

,

Raghavendra Kanakagiri

,

,

,

Torsten Hoefler

,

,

CoRR, 2024

MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models.

[BibT_eX]

[DOI]

,

Roberto L. Castro

,

,

Torsten Hoefler

,

CoRR, 2024

Understanding Data Movement in Tightly Coupled Heterogeneous Systems: A Case Study with the Grace Hopper Superchip.

[BibT_eX]

[DOI]

,

Mikhail Khalilov

,

,

Giridhar Chukkapalli

,

Thomas C. Schulthess

,

Torsten Hoefler

CoRR, 2024

High Performance Unstructured SpMM Computation Using Tensor Cores.

[BibT_eX]

[DOI]

Patrik Okanovic

,

Grzegorz Kwasniewski

,

Paolo Sylos Labini

,

,

,

Torsten Hoefler

CoRR, 2024

REPS: Recycling Entropies for Packet Spraying to Adaptively Explore Paths and Mitigate Failures.

[BibT_eX]

[DOI]

,

,

Ahmad Ghalayini

,

Mohammad Dohadwala

,

Michael Papamichael

,

Daniele De Sensi

,

Torsten Hoefler

CoRR, 2024

Demystifying Higher-Order Graph Neural Networks.

[BibT_eX]

[DOI]

,

Florian Scheidl

,

Lukas Gianinazzi

,

Shachar Klaiman

,

Jürgen Müller

,

Torsten Hoefler

CoRR, 2024

Accelerating Graph-based Vector Search via Delayed-Synchronization Traversal.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2024

Multi-Head RAG: Solving Multi-Aspect Problems with LLMs.

[BibT_eX]

[DOI]

,

,

,

Robert Gerstenberger

,

Lucas Weitzendorf

,

,

,

,

,

Jürgen Müller

,

Hubert Niewiadomski

,

,

Michal Podstawski

,

Torsten Hoefler

CoRR, 2024

CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks.

[BibT_eX]

[DOI]

,

Lorenzo Paleari

,

,

,

Robert Gerstenberger

,

,

,

Hubert Niewiadomski

,

Torsten Hoefler

CoRR, 2024

FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

CoRR, 2024

Towards Specialized Supercomputers for Climate Sciences: Computational Requirements of the Icosahedral Nonhydrostatic Weather and Climate Model.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Alexandru Calotoiu

,

Anurag Dipankar

,

Thomas C. Schulthess

,

Xavier Lapillonne

,

CoRR, 2024

SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels.

[BibT_eX]

[DOI]

,

Torsten Hoefler

CoRR, 2024

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Robert Wisniewski

,

Torsten Hoefler

CoRR, 2024

SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies.

[BibT_eX]

[DOI]

,

,

Daniele De Sensi

,

,

,

,

,

,

,

Ahmad Ghalayini

,

Daniel S. F. Alves

,

Michael Papamichael

,

Adrian M. Caulfield

,

Torsten Hoefler

CoRR, 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs.

[BibT_eX]

[DOI]

,

Amirkeivan Mohtashami

,

Maximilian L. Croci

,

,

,

,

Torsten Hoefler

,

CoRR, 2024

Topologies of Reasoning: Demystifying Chains, Trees, and Graphs of Thoughts.

[BibT_eX]

[DOI]

,

,

,

Robert Gerstenberger

,

,

,

,

Grzegorz Kwasniewski

,

Jürgen Müller

,

Lukas Gianinazzi

,

,

Hubert Niewiadomski

,

,

Torsten Hoefler

CoRR, 2024

Cppless: Productive and Performant Serverless Programming in C++.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2024

XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

,

,

Manish Parashar

,

,

Matthias Troyer

,

Thomas C. Schulthess

,

,

Jack J. Dongarra

CoRR, 2024

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs.

[BibT_eX]

[DOI]

Mikhail Khalilov

,

,

,

Alessandro Vezzu

,

,

Salvatore Di Girolamo

,

,

Daniele De Sensi

,

,

Torsten Hoefler

Proceedings of the 2024 USENIX Annual Technical Conference, 2024

PolarStar: Expanding the Horizon of Diameter-3 Networks.

[BibT_eX]

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

,

Piotr Luczynski

,

Saleh Ashkboosh

,

Florian Scheidl

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Swing: Short-cutting Rings for Higher Bandwidth Allreduce.

[BibT_eX]

[DOI]

Daniele De Sensi

,

,

,

Torsten Hoefler

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network.

[BibT_eX]

[DOI]

,

,

Daniele De Sensi

,

,

,

,

,

Marek Konieczny

,

Kartik Lakhotia

,

,

,

Fabrizio Petrini

,

Torsten Hoefler

Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation, 2024

Software Resource Disaggregation for HPC with Serverless Computing.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Low-Depth Spatial Tree Algorithms.

[BibT_eX]

[DOI]

,

,

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Piotr Luczynski

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

DiffDA: a Diffusion model for weather-scale Data Assimilation.

[BibT_eX]

[DOI]

,

Lukas Gianinazzi

,

,

Peter D. Düben

,

Torsten Hoefler

Proceedings of the Forty-first International Conference on Machine Learning, 2024

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression.

[BibT_eX]

[DOI]

,

Ruslan Svirschevski

,

Vage Egiazarian

,

Denis Kuznedelev

,

,

,

Alexander Borzunov

,

Torsten Hoefler

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

SliceGPT: Compress Large Language Models by Deleting Rows and Columns.

[BibT_eX]

[DOI]

,

Maximilian L. Croci

,

Marcelo Gennari Do Nascimento

,

Torsten Hoefler

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Near-Optimal Wafer-Scale Reduce.

[BibT_eX]

[DOI]

Piotr Luczynski

,

Lukas Gianinazzi

,

,

Leighton Wilson

,

Daniele De Sensi

,

Torsten Hoefler

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an Example.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

,

Konstantin Taranov

,

Torsten Hoefler

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems Through Polling-Free and Retry-Free Operation.

[BibT_eX]

[DOI]

,

Marc Gantenbein

,

Alessandro Ottaviano

,

Torsten Hoefler

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Process-as-a-Service: Unifying Elastic and Stateful Clouds with Serverless Processes.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

,

Roman Böhringer

,

,

Torsten Hoefler

Proceedings of the 2024 ACM Symposium on Cloud Computing, 2024

Graph of Thoughts: Solving Elaborate Problems with Large Language Models.

[BibT_eX]

[DOI]

,

,

,

Robert Gerstenberger

,

Michal Podstawski

,

Lukas Gianinazzi

,

,

,

Hubert Niewiadomski

,

,

Torsten Hoefler

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Sparse Stream Semantic Registers: A Lightweight ISA Extension Accelerating General Sparse Linear Algebra.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

IEEE Trans. Parallel Distributed Syst., December, 2023

Performance Measurement Dataset of the HPC Benchmarks FASTEST, Kripke, LULESH, MiniFE, Quicksilver, and RELeARN for Scalability Studies with Extra-P.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

Sebastian Rinke

,

Thorsten Reimann

,

Torsten Hoefler

,

Dataset, November, 2023

Myths and legends in high-performance computing.

[BibT_eX]

[DOI]

Satoshi Matsuoka

,

,

,

Aleksandr Drozd

,

Torsten Hoefler

Int. J. High Perform. Comput. Appl., July, 2023

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems.

[BibT_eX]

[DOI]

,

,

Vasiliki Kalavri

,

Michael Kapralov

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., June, 2023

GNN Scaling 0.1 Software Artifact.

[BibT_eX]

[DOI]

,

,

Robert Gerstenberger

,

Paolo Sylos Labini

,

Alexandros Nikolaos Ziogas

,

,

Lukas Gianinazzi

,

Florian Scheidl

,

,

,

,

Grzegorz Kwasniewski

,

Raghavendra Kanakagiri

,

,

,

,

,

Torsten Hoefler

Dataset, June, 2023

GDI-RMA 0.1 Software Artifact.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

Jürgen Müller

,

,

,

George Mitenkov

,

Marek T. Michalewicz

,

Torsten Hoefler

Dataset, June, 2023

Disentangling Hype from Practicality: On Realistically Achieving Quantum Advantage.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Matthias Troyer

Commun. ACM, May, 2023

Arrow Matrix Decompositions.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Alexandros Nikolaos Ziogas

,

Piotr Luczynski

,

Saleh Ashkboosh

,

,

Florian Scheidl

,

,

,

,

,

Torsten Hoefler

Dataset, April, 2023

Earth Virtualization Engines: A Technical Perspective.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2023

How to Prune Your Language Model: Recovering Accuracy on the "Sparsity May Cry" Benchmark.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

CoRR, 2023

RapidChiplet: A Toolchain for Rapid Design Space Exploration of Chiplet Architectures.

[BibT_eX]

[DOI]

,

Benigna Bruggmann

,

,

,

Torsten Hoefler

CoRR, 2023

Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

CoRR, 2023

Towards End-to-end 4-Bit Inference on Generative Large Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Torsten Hoefler

,

CoRR, 2023

Cached Operator Reordering: A Unified View for Fast GNN Training.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2023

High-Performance Graph Databases That Are Portable, Programmable, and Scale to Hundreds of Thousands of Cores.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

Jürgen Müller

,

,

,

George Mitenkov

,

Wojciech Chlapek

,

Marek T. Michalewicz

,

Torsten Hoefler

CoRR, 2023

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch.

[BibT_eX]

[DOI]

,

Satoki Ishikawa

,

,

,

Torsten Hoefler

CoRR, 2023

STen: Productive and Efficient Sparsity in PyTorch.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, 2023

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2023

PolarStar: Expanding the Scalability Horizon of Diameter-3 Networks.

[BibT_eX]

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

CoRR, 2023

Datacenter Ethernet and RDMA: Issues at Hyperscale.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

CoRR, 2023

Approximate Reversible Circuits for NISQ-Era Quantum Computers.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

CoRR, 2023

AutoDDL: Automatic Distributed Deep Learning with Asymptotically Optimal Communication.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

CoRR, 2023

A Theory of I/O-Efficient Sparse Neural Network Inference.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

CoRR, 2023

Data Center Ethernet and Remote Direct Memory Access: Issues at Hyperscale.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Keith D. Underwood

,

Robert Alverson

,

,

Vahid Tabatabaee

,

,

Surendra Anubolu

,

,

,

,

Computer, 2023

SAGE: Software-based Attestation for GPU Execution.

[BibT_eX]

[DOI]

,

Benjamin Rothenberger

,

,

,

Torsten Hoefler

,

Proceedings of the 2023 USENIX Annual Technical Conference, 2023

In-network Allreduce with Multiple Spanning Trees on PolarFly.

[BibT_eX]

[DOI]

Kartik Lakhotia

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures, 2023

A Reference Implementation for a Quantum Message Passing Interface.

[BibT_eX]

[DOI]

,

,

Samuel Alexander Stein

,

,

,

Martin Roetteler

,

Torsten Hoefler

,

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Alexandros Nikolaos Ziogas

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

Co-design Hardware and Algorithm for Vector Search.

[BibT_eX]

[DOI]

,

,

,

Johannes de Fine Licht

,

,

,

Cédric Renggli

,

,

Theodoros Rekatsinas

,

Torsten Hoefler

,

Proceedings of the International Conference for High Performance Computing, 2023

HEAR: Homomorphically Encrypted Allreduce.

[BibT_eX]

[DOI]

,

Mikhail Khalilov

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

VENOM: A Vectorized N: M Format for Unleashing the Power of Sparse Tensor Cores.

[BibT_eX]

[DOI]

Roberto L. Castro

,

,

,

,

Basilio B. Fraguela

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

High-Performance and Programmable Attentional Graph Neural Networks with Global Tensor Formulations.

[BibT_eX]

[DOI]

,

,

Robert Gerstenberger

,

Paolo Sylos Labini

,

Alexandros Nikolaos Ziogas

,

,

Lukas Gianinazzi

,

Florian Scheidl

,

,

,

,

Grzegorz Kwasniewski

,

Raghavendra Kanakagiri

,

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

The Graph Database Interface: Scaling Online Transactional and Analytical Graph Workloads to Hundreds of Thousands of Cores.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

,

Michal Podstawski

,

,

,

George Mitenkov

,

Wojciech Chlapek

,

Marek T. Michalewicz

,

Hubert Niewiadomski

,

Jürgen Müller

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2023

PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

HOT: Higher-Order Dynamic Graph Representation Learning With Efficient Transformers.

[BibT_eX]

[DOI]

,

Afonso Claudino Catarino

,

Lukas Gianinazzi

,

,

,

Hubert Niewiadomski

,

Torsten Hoefler

Proceedings of the Learning on Graphs Conference, 27-30 November 2023, Virtual Event., 2023

rFaaS: Enabling High Performance Serverless with RDMA and Leases.

[BibT_eX]

[DOI]

,

Konstantin Taranov

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Performance Embeddings: A Similarity-Based Transfer Tuning Approach to Performance Optimization.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 37th International Conference on Supercomputing, 2023

FMI: Fast and Cheap Message Passing for Serverless Functions.

[BibT_eX]

[DOI]

,

Roman Böhringer

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 37th International Conference on Supercomputing, 2023

Compressing multidimensional weather and climate data into neural networks.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the Eleventh International Conference on Learning Representations, 2023

OPTQ: Accurate Quantization for Generative Pre-trained Transformers.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Differentiable Transportation Pruning.

[BibT_eX]

[DOI]

,

Jan C. van Gemert

,

Torsten Hoefler

,

,

Evangelos Eleftheriou

,

Bram-Ernst Verhoef

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Streaming Task Graph Scheduling for Dataflow Architectures.

[BibT_eX]

[DOI]

Tiziano De Matteis

,

Lukas Gianinazzi

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement.

[BibT_eX]

[DOI]

,

,

Matheus A. Cavalcante

,

,

,

Torsten Hoefler

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Sparse Hamming Graph: A Customizable Network-on-Chip Topology.

[BibT_eX]

[DOI]

,

,

Matheus A. Cavalcante

,

,

,

Torsten Hoefler

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Maximum Flows in Parametric Graph Templates.

[BibT_eX]

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

Proceedings of the Algorithms and Complexity - 13th International Conference, 2023

Bridging Control-Centric and Data-Centric Optimization.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

User-guided Page Merging for Memory Deduplication in Serverless Systems.

[BibT_eX]

[DOI]

,

,

,

Alexandru Calotoiu

,

Torsten Hoefler

Proceedings of the IEEE International Conference on Big Data, 2023

2022

Work-Stealing Prefix Scan: Addressing Load Imbalance in Large-Scale Image Registration.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Paolo Bientinesi

,

Benjamin Berkels

IEEE Trans. Parallel Distributed Syst., 2022

Python FPGA Programming with Data-Centric Multi-Level Design.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

Tiziano De Matteis

,

,

,

,

,

Carl-Johannes Johnsen

,

Torsten Hoefler

CoRR, 2022

Efficient RDMA Communication Protocols.

[BibT_eX]

[DOI]

Konstantin Taranov

,

,

Torsten Hoefler

CoRR, 2022

Assessing requirements to scale to practical quantum advantage.

[BibT_eX]

[DOI]

Michael E. Beverland

,

,

Matthias Troyer

,

Krysta M. Svore

,

Torsten Hoefler

,

Vadym Kliuchnikov

,

,

,

Aarthi Sundaram

,

Alexander Vaschillo

CoRR, 2022

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecast.

[BibT_eX]

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

CoRR, 2022

Deinsum: Practically I/O Optimal Multilinear Algebra.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

CoRR, 2022

The spatial computer: A model for energy-efficient parallel computation.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

,

,

,

Piotr Luczynski

,

Torsten Hoefler

CoRR, 2022

FaasKeeper: a Blueprint for Serverless Services.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

Konstantin Taranov

,

Torsten Hoefler

CoRR, 2022

The Convergence of Hyperscale Data Center and High-Performance Computing Networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Computer, 2022

Benchmarking Data Science: 12 Ways to Lie With Statistics and Performance on Parallel Computers.

[BibT_eX]

[DOI]

Torsten Hoefler

Computer, 2022

The Red-Blue Pebble Game on Trees and DAGs with Large Input.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the Structural Information and Communication Complexity, 2022

KafkaDirect: Zero-copy Data Access for Apache Kafka over RDMA Networks.

[BibT_eX]

[DOI]

Konstantin Taranov

,

,

Virendra J. Marathe

,

Torsten Hoefler

Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Deinsum: Practically I/O Optimal Multi-Linear Algebra.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

Grzegorz Kwasniewski

,

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Boosting Performance Optimization with Interactive Data Movement Visualization.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Efficient Quantized Sparse Matrix Operations on Tensor Cores.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

PolarFly: A Cost-Effective and Flexible Low-Diameter Topology.

[BibT_eX]

[DOI]

Kartik Lakhotia

,

,

,

,

,

Torsten Hoefler

,

Fabrizio Petrini

Proceedings of the SC22: International Conference for High Performance Computing, 2022

HammingMesh: A Network Topology for Large-Scale Deep Learning.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

,

,

,

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Building Blocks for Network-Accelerated Distributed File Systems.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

Daniele De Sensi

,

Konstantin Taranov

,

Milos Malesevic

,

,

,

Severin Kistler

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations.

[BibT_eX]

[DOI]

,

Cesare Miglioli

,

Paolo Sylos Labini

,

,

,

Raghavendra Kanakagiri

,

,

,

Michal Podstawski

,

Grzegorz Kwasniewski

,

,

,

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Productive Performance Engineering for Weather and Climate Modeling with Python.

[BibT_eX]

[DOI]

,

,

Florian Deconinck

,

,

,

,

,

,

Jeremy McGibbon

,

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Near-optimal sparse allreduce for distributed deep learning.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the PPoPP '22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2, 2022

Spatial Mixture-of-Experts.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

,

,

,

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Neural Graph Databases.

[BibT_eX]

[DOI]

,

,

Florian Scheidl

,

,

,

Michal Podstawski

,

,

Torsten Hoefler

Proceedings of the Learning on Graphs Conference, 2022

Motif Prediction with Graph Neural Networks.

[BibT_eX]

[DOI]

,

,

Cesare Miglioli

,

,

Grzegorz Kwasniewski

,

,

Raghavendra Kanakagiri

,

,

Lukas Gianinazzi

,

,

Torsten Hoefler

Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching.

[BibT_eX]

[DOI]

András Strausz

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

I/O-Optimal Cache-Oblivious Sparse Matrix-Sparse Matrix Multiplication.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Metamorphic Fuzzing of C++ Libraries.

[BibT_eX]

[DOI]

,

Alastair F. Donaldson

,

,

Torsten Hoefler

Proceedings of the 15th IEEE Conference on Software Testing, Verification and Validation, 2022

Performance-detective: automatic deduction of cheap and accurate performance models.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

A data-centric optimization framework for machine learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Lifting C semantics for dataflow optimization.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

,

Grzegorz Kwasniewski

,

Johannes de Fine Licht

,

,

,

Torsten Hoefler

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Neural Parameter Allocation Search.

[BibT_eX]

[DOI]

Bryan A. Plummer

,

,

,

Torsten Hoefler

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

Temporal Vectorization: A Compiler Approach to Automatic Multi-Pumping.

[BibT_eX]

[DOI]

Carl-Johannes Johnsen

,

Tiziano De Matteis

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Fast Arbitrary Precision Floating Point on FPGA.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

Christopher A. Pattison

,

Alexandros Nikolaos Ziogas

,

David Simmons-Duffin

,

Torsten Hoefler

Proceedings of the 30th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2022

Accelerating Data Serialization/Deserialization Protocols with In-Network Compute.

[BibT_eX]

[DOI]

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the IEEE/ACM International Workshop on Exascale MPI, 2022

RED-SEA: Network Solution for Exascale Architectures.

[BibT_eX]

[DOI]

Andrea Biagioni

,

,

Ottorino Frezza

,

Francesca Lo Cicero

,

Alessandro Lonardo

,

Michele Martinelli

,

Pier Stanislao Paolucci

,

Elena Pastorelli

,

Francesco Simula

,

Matteo Turisini

,

,

Roberto Ammendola

,

Pascale Bernier-Bruna

,

,

,

,

Pierre-Axel Lagadec

,

Gregoire Pichon

,

,

Gaetan De Gassowski

,

Matthieu Hautreaux

,

Stephane Mathieu

,

,

,

,

Torsten Hoefler

,

,

,

Giuseppe Piero Brandino

,

Francesco De Giorgi

,

,

Iakovos Mavroidis

,

Yannis Papaefstathiou

,

Nikolaos Tampouratzis

,

Benjamin Kalisch

,

Ulrich Krackhardt

,

Mondrian Nuessle

,

Pantelis Xirouchakis

,

Vangelis Mageiropoulos

,

Michalis Gianioudis

,

,

Aggelos Ioannou

,

Nikos Kallimanis

,

,

Manolis Katevenis

,

Wolfgang Frings

,

Dominik Gottwald

,

Felime Guimaraes

,

,

,

,

,

,

,

Jennifer Lopez Barillao

,

,

,

Francisco J. Alfaro

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

José L. Sánchez

,

Adrián Castelló

,

,

María Engracia Gómez

,

Enrique S. Quintana-Ortí

,

Julio Sahuquillo

,

Eugenio Stabile

Proceedings of the 25th Euromicro Conference on Digital System Design, 2022

Circuits for Measurement Based Quantum State Preparation.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

A RDMA Interface for Ultra-Fast Ultrasound Data-Streaming over an Optical Link.

[BibT_eX]

[DOI]

Andrea Cossettini

,

Konstantin Taranov

,

,

,

Torsten Hoefler

,

Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

NeVerMore: Exploiting RDMA Mistakes in NVMe-oF Storage Applications.

[BibT_eX]

[DOI]

Konstantin Taranov

,

Benjamin Rothenberger

,

Daniele De Sensi

,

,

Torsten Hoefler

Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022

2021

Transformations of High-Level Synthesis Codes for High-Performance Computing.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

,

Simon Meierhans

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

Breaking (Global) Barriers in Parallel Stochastic Optimization With Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

,

,

Giorgi Nadiradze

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.

[BibT_eX]

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2021

Snitch: A Tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

IEEE Trans. Computers, 2021

Stream Semantic Registers: A Lightweight RISC-V ISA Extension Achieving Full Compute Utilization in Single-Issue Cores.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

IEEE Trans. Computers, 2021

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation.

[BibT_eX]

[DOI]

,

Christoph Müller

,

Oleksandr Zinenko

,

,

,

,

,

Torsten Hoefler

,

ACM Trans. Archit. Code Optim., 2021

Communication Lower Bounds of Bilinear Algorithms for Symmetric Tensor Contractions.

[BibT_eX]

[DOI]

Edgar Solomonik

,

,

Torsten Hoefler

SIAM J. Sci. Comput., 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[BibT_eX]

[DOI]

,

Zur Vonarburg-Shmaria

,

Yannick Schaffner

,

Leonardo Schwarz

,

Grzegorz Kwasniewski

,

Lukas Gianinazzi

,

,

,

Tobias Holenstein

,

Sebastian Leisinger

,

Peter Tatkowski

,

,

,

,

Philipp Lindenberger

,

Marek Konieczny

,

,

Torsten Hoefler

Proc. VLDB Endow., 2021

Noise in the Clouds: Influence of Network Performance Variability on Application Scalability.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Tiziano De Matteis

,

Konstantin Taranov

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proc. ACM Meas. Anal. Comput. Syst., 2021

FPL: fast Presburger arithmetic through transprecision.

[BibT_eX]

[DOI]

Arjun Pitchanathan

,

Christian Ulmann

,

,

Torsten Hoefler

,

Proc. ACM Program. Lang., 2021

The digital revolution of Earth-system science.

[BibT_eX]

[DOI]

,

Peter D. Düben

,

Torsten Hoefler

,

,

Thomas C. Schulthess

,

Nat. Comput. Sci., 2021

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

,

Alexandra Peste

J. Mach. Learn. Res., 2021

RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing.

[BibT_eX]

[DOI]

,

Konstantin Taranov

,

Alexandru Calotoiu

,

Torsten Hoefler

CoRR, 2021

Learning Combinatorial Node Labeling Algorithms.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Maximilian Fries

,

,

,

,

Torsten Hoefler

CoRR, 2021

Towards Million-Server Network Simulations on Just a Laptop.

[BibT_eX]

[DOI]

,

Marcel Schneider

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

,

Raghavendra Kanakagiri

,

Grzegorz Kwasniewski

,

Rachata Ausavarungnirun

,

,

Konstantinos Kanellopoulos

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

,

Juan Gómez-Luna

,

,

Lukas Kapp-Schwoerer

,

Salvatore Di Girolamo

,

Marek Konieczny

,

,

Torsten Hoefler

CoRR, 2021

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra.

[BibT_eX]

[DOI]

,

Zur Vonarburg-Shmaria

,

Yannick Schaffner

,

Leonardo Schwarz

,

Grzegorz Kwasniewski

,

Lukas Gianinazzi

,

,

,

Tobias Holenstein

,

Sebastian Leisinger

,

Peter Tatkowski

,

,

,

,

Philipp Lindenberger

,

,

Marek Konieczny

,

,

Torsten Hoefler

CoRR, 2021

Enabling Dataflow Optimization for Quantum Programs.

[BibT_eX]

[DOI]

,

,

Vadym Kliuchnikov

,

Torsten Hoefler

CoRR, 2021

ReDMArk: Bypassing RDMA Security Mechanisms.

[BibT_eX]

[DOI]

Benjamin Rothenberger

,

Konstantin Taranov

,

,

Torsten Hoefler

Proceedings of the 30th USENIX Security Symposium, 2021

Naos: Serialization-free RDMA networking in Java.

[BibT_eX]

[DOI]

Konstantin Taranov

,

,

,

Torsten Hoefler

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

MigrOS: Transparent Live-Migration Support for Containerised RDMA Applications.

[BibT_eX]

[DOI]

,

,

Leo Sahaya Daphne Antony

,

Torsten Hoefler

,

Hermann Härtig

Proceedings of the 2021 USENIX Annual Technical Conference, 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

Lukas Gianinazzi

,

Alexandru Calotoiu

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Parallel Algorithms for Finding Large Cliques in Sparse Graphs.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

,

Yannick Schaffner

,

Torsten Hoefler

Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

CoRM: Compactable Remote Memory over RDMA.

[BibT_eX]

[DOI]

Konstantin Taranov

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Productivity, portability, performance: data-centric Python.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Alexandru Calotoiu

,

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Flare: flexible in-network allreduce.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

,

Alexandros Nikolaos Ziogas

,

Jens Eirik Saethre

,

André Gaillard

,

,

,

Anton Kozhevnikov

,

Joost VandeVondele

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Distributed quantum computing with QMPI.

[BibT_eX]

[DOI]

,

Damian S. Steiger

,

Torsten Hoefler

,

Matthias Troyer

Proceedings of the International Conference for High Performance Computing, 2021

Clairvoyant prefetching for distributed machine learning I/O.

[BibT_eX]

[DOI]

,

Roman Böhringer

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

Chimera: efficiently training large-scale neural networks with bidirectional pipelines.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Extracting clean performance models from tainted programs.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

,

,

,

Torsten Hoefler

Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

Data Movement Is All You Need: A Case Study on Optimizing Transformers.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

SeBS: a serverless benchmark suite for function-as-a-service computing.

[BibT_eX]

[DOI]

,

Grzegorz Kwasniewski

,

,

Michal Podstawski

,

Torsten Hoefler

Proceedings of the Middleware '21: 22nd International Middleware Conference, Québec City, Canada, December 6, 2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems.

[BibT_eX]

[DOI]

,

Raghavendra Kanakagiri

,

Grzegorz Kwasniewski

,

Rachata Ausavarungnirun

,

,

Konstantinos Kanellopoulos

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

,

Juan Gómez-Luna

,

Jakub Golinowski

,

,

Lukas Kapp-Schwoerer

,

Salvatore Di Girolamo

,

,

Marek Konieczny

,

,

Torsten Hoefler

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

A RISC-V in-network accelerator for flexible high-performance low-power packet processing.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Noise-Resilient Empirical Performance Modeling with Deep Neural Networks.

[BibT_eX]

[DOI]

,

Alexander Geiß

,

Johannes Wehrstein

,

Alexandru Calotoiu

,

Thorsten Reimann

,

Torsten Hoefler

,

Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

NPBench: a benchmarking suite for high-performance NumPy.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

,

Torsten Hoefler

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations.

[BibT_eX]

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

An Efficient Algorithm for Sparse Quantum State Preparation.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 58th ACM/IEEE Design Automation Conference, 2021

StencilFlow: Mapping Large Stencil Programs to Distributed Spatial Computing Systems.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

,

Tiziano De Matteis

,

,

,

Torsten Hoefler

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Hermes: Enabling efficient large-scale simulation in MATSim.

[BibT_eX]

[DOI]

,

,

Joschka Bischoff

,

,

Wolfgang Scherr

,

Torsten Hoefler

,

Proceedings of the 12th International Conference on Ambient Systems, 2021

2020

ExtraPeak: Advanced Automatic Performance Modeling for HPC Applications.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

,

Torsten Hoefler

,

,

,

Proceedings of the Software for Exascale Computing - SPPEXA 2016-2019, 2020

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

,

,

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

Torsten Hoefler

ACM Trans. Reconfigurable Technol. Syst., 2020

Polyhedral Compilation for Racetrack Memories.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

Jerónimo Castrillón

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2020

Dawn: a High-level Domain-Specific Language Compiler Toolchain for Weather and Climate Applications.

[BibT_eX]

[DOI]

,

,

Fabian Thuering

,

Torsten Hoefler

,

Supercomput. Front. Innov., 2020

Special issue: Selected papers from EuroMPI 2019.

[BibT_eX]

[DOI]

Jesper Larsson Träff

,

Torsten Hoefler

Parallel Comput., 2020

Assertion-based optimization of Quantum programs.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Matthias Troyer

Proc. ACM Program. Lang., 2020

Fast linear programming through transprecision computing on small and sparse data.

[BibT_eX]

[DOI]

,

Theodoros Theodoridis

,

Maximilian Falkenstein

,

Arjun Pitchanathan

,

,

,

,

Torsten Hoefler

Proc. ACM Program. Lang., 2020

Deep Data Flow Analysis.

[BibT_eX]

[DOI]

,

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

Michael F. P. O'Boyle

CoRR, 2020

Parametric Graph Templates: Properties and Algorithms.

[BibT_eX]

[DOI]

,

Lukas Gianinazzi

,

Torsten Hoefler

,

CoRR, 2020

PsPIN: A high-performance low-power architecture for flexible in-network compute.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

,

Alexandru Calotoiu

,

,

,

,

,

Torsten Hoefler

CoRR, 2020

TardiS: Migrating Containers with RDMA Networks.

[BibT_eX]

[DOI]

,

,

Leo Sahaya Daphne Antony

,

Torsten Hoefler

,

Hermann Härtig

CoRR, 2020

High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.

[BibT_eX]

[DOI]

,

,

Marcel Schneider

,

Marek Konieczny

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

CoRR, 2020

Shapeshifter Networks: Cross-layer Parameter Sharing for Scalable and Effective Deep Learning.

[BibT_eX]

[DOI]

Bryan A. Plummer

,

,

,

Torsten Hoefler

,

CoRR, 2020

Domain-Specific Multi-Level IR Rewriting for GPU.

[BibT_eX]

[DOI]

,

Christoph Müller

,

Oleksandr Zinenko

,

,

,

,

,

Torsten Hoefler

,

CoRR, 2020

Deep Learning for Post-Processing Ensemble Weather Forecasts.

[BibT_eX]

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2020

Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging.

[BibT_eX]

[DOI]

,

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2020

ProGraML: Graph-based Deep Learning for Program Optimization and Analysis.

[BibT_eX]

[DOI]

,

Zacharias V. Fisches

,

,

Torsten Hoefler

,

CoRR, 2020

Snitch: A 10 kGE Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

CoRR, 2020

sRDMA - Efficient NIC-based Authentication and Encryption for Remote Direct Memory Access.

[BibT_eX]

[DOI]

Konstantin Taranov

,

Benjamin Rothenberger

,

,

Torsten Hoefler

Proceedings of the 2020 USENIX Annual Technical Conference, 2020

Parallel Planar Subgraph Isomorphism and Vertex Connectivity.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

Torsten Hoefler

Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

An in-depth analysis of the slingshot interconnect.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

fBLAS: streaming linear algebra on FPGA.

[BibT_eX]

[DOI]

Tiziano De Matteis

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

ScalAna: automating scaling loss detection with graph analysis.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

,

Proceedings of the International Conference for High Performance Computing, 2020

Empirical Modeling of Spatially Diverging Performance.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

Markus Geisenhofer

,

,

,

,

Torsten Hoefler

,

Martin Oberlack

,

Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

FatPaths: routing in supercomputers and data centers when shortest paths fall short.

[BibT_eX]

[DOI]

,

Marcel Schneider

,

Marek Konieczny

,

,

Erik Henriksson

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

High-performance parallel graph coloring with strong guarantees on work, depth, and quality.

[BibT_eX]

[DOI]

,

,

,

Zur Vonarburg-Shmaria

,

Lukas Gianinazzi

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2020

Communication and Timing Issues with MPI Virtualization.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

Taming unbalanced training workloads in deep learning with partial collective operations.

[BibT_eX]

[DOI]

,

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Identifying scalability bottlenecks for large-scale parallel programs with graph analysis.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

,

Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Learning Cost-Effective Sampling Strategies for Empirical Performance Modeling.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

Sebastian Rinke

,

Thorsten Reimann

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Communication-Efficient Jaccard similarity for High-Performance Distributed Genome Comparisons.

[BibT_eX]

[DOI]

,

Raghavendra Kanakagiri

,

,

Mikhail Karasikov

,

,

Torsten Hoefler

,

Edgar Solomonik

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

Grzegorz Kwasniewski

,

Torsten Hoefler

Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

ATUNs: Modular and Scalable Support for Atomic Operations in a Shared Memory Multiprocessor.

[BibT_eX]

[DOI]

,

,

,

Torsten Hoefler

,

Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Augment Your Batch: Improving Generalization Through Instance Repetition.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Engineering Algorithms for Scalability through Continuous Validation of Performance Expectations.

[BibT_eX]

[DOI]

,

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Alexandre Strube

,

IEEE Trans. Parallel Distributed Syst., 2019

Strong consistency is not hard to get: Two-Phase Locking and Two-Phase Commit on Thousands of Cores.

[BibT_eX]

[DOI]

Claude Barthels

,

,

Konstantin Taranov

,

,

Torsten Hoefler

Proc. VLDB Endow., 2019

Demystifying Parallel and Distributed Deep Learning: An In-depth Concurrency Analysis.

[BibT_eX]

[DOI]

,

Torsten Hoefler

ACM Comput. Surv., 2019

Reflecting on the Goal and Baseline for Exascale Computing: A Roadmap Based on Weather and Climate Simulations.

[BibT_eX]

[DOI]

Thomas C. Schulthess

,

,

,

,

Torsten Hoefler

,

Christoph M. Schär

Comput. Sci. Eng., 2019

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.

[BibT_eX]

[DOI]

,

,

Vasiliki Kalavri

,

Michael Kapralov

,

Torsten Hoefler

CoRR, 2019

A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

CoRR, 2019

Predicting Weather Uncertainty with Deep Convnets.

[BibT_eX]

[DOI]

Peter Grönquist

,

,

,

,

,

,

Torsten Hoefler

CoRR, 2019

hlslib: Software Engineering for Hardware Design.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

Torsten Hoefler

CoRR, 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.

[BibT_eX]

[DOI]

,

Berry Weinstein

,

,

,

Torsten Hoefler

,

CoRR, 2019

FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short.

[BibT_eX]

[DOI]

,

Marcel Schneider

,

,

Marek Konieczny

,

Erik Henriksson

,

Salvatore Di Girolamo

,

,

Torsten Hoefler

CoRR, 2019

Graph Processing on FPGAs: Taxonomy, Survey, Challenges.

[BibT_eX]

[DOI]

,

Dimitri Stanojevic

,

Johannes de Fine Licht

,

,

Torsten Hoefler

CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.

[BibT_eX]

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

CoRR, 2019

Augment your batch: better training with larger batches.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

CoRR, 2019

Head-of-line blocking avoidance in Slim Fly networks using deadlock-free non-minimal and adaptive routing.

[BibT_eX]

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Concurr. Comput. Pract. Exp., 2019

Optimizing the data movement in quantum transport simulations via data-centric parallel programming.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.

[BibT_eX]

[DOI]

Alexandros Nikolaos Ziogas

,

,

Guillermo Indalecio Fernández

,

,

Mathieu Luisier

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Mitigating network noise on Dragonfly networks through application-aware routing.

[BibT_eX]

[DOI]

Daniele De Sensi

,

Salvatore Di Girolamo

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

SparCML: high-performance sparse communication for machine learning.

[BibT_eX]

[DOI]

Cédric Renggli

,

,

Mehdi Aghagolzadeh

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Streaming message interface: high-performance distributed memory programming on reconfigurable hardware.

[BibT_eX]

[DOI]

Tiziano De Matteis

,

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication.

[BibT_eX]

[DOI]

Grzegorz Kwasniewski

,

,

,

Joost VandeVondele

,

Raffaele Solcà

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Network-accelerated non-contiguous memory transfers.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

Konstantin Taranov

,

,

Michael Schaffner

,

,

,

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Slim graph: practical lossy graph compression for approximate graph processing, storage, and analytics.

[BibT_eX]

[DOI]

,

,

Lukas Gianinazzi

,

Robert Gerstenberger

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.

[BibT_eX]

[DOI]

,

Johannes de Fine Licht

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2019

Foreword EuroMPI 2019.

[BibT_eX]

[DOI]

Jesper Larsson Träff

,

Torsten Hoefler

Proceedings of the 26th European MPI Users' Group Meeting, 2019

Corrected trees for reliable group communication.

[BibT_eX]

[DOI]

Martin Küttler

,

,

,

Carsten Weinhold

,

Hermann Härtig

,

,

Torsten Hoefler

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

A fast analytical model of fully associative caches.

[BibT_eX]

[DOI]

,

,

Laurin Brandner

,

Torsten Hoefler

Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Porting the COSMO Weather Model to Manycore CPUs.

[BibT_eX]

[DOI]

,

Stefan Moosbrugger

,

,

,

,

Anton Afanasyev

,

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the Platform for Advanced Scientific Computing Conference, 2019

Invited Talk 2.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

SimFS: A Simulation Data Virtualizing File System Interface.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning.

[BibT_eX]

[DOI]

,

,

,

Alexandros Nikolaos Ziogas

,

,

Torsten Hoefler

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Using performance models to understand scalable Krylov solver performance at scale for structured grid problems.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Proceedings of the ACM International Conference on Supercomputing, 2019

Substream-Centric Maximum Matchings on FPGA.

[BibT_eX]

[DOI]

,

,

,

Johannes de Fine Licht

,

Torsten Hoefler

Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019

Embedding Functions Into Reversible Circuits: A Probabilistic Approach to the Number of Lines.

[BibT_eX]

[DOI]

,

Frances Ann Hubis

,

Torsten Hoefler

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

Absinthe: Learning an Analytical Performance Model to Fuse and Tile Stencil Codes in One Shot.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Cache-Oblivious MPI All-to-All Communications Based on Morton Order.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2018

Using Hoare logic for quantum circuit optimization.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Matthias Troyer

CoRR, 2018

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations.

[BibT_eX]

[DOI]

,

Torsten Hoefler

CoRR, 2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

CoRR, 2018

SparCML: High-Performance Sparse Communication for Machine Learning.

[BibT_eX]

[DOI]

Cédric Renggli

,

,

Torsten Hoefler

CoRR, 2018

Automatic Verification of RMA Programs via Abstraction Extrapolation.

[BibT_eX]

[DOI]

,

Andrei Marian Dan

,

,

Torsten Hoefler

,

Martin T. Vechev

Proceedings of the Verification, Model Checking, and Abstract Interpretation, 2018

ShenTu: processing multi-trillion edge graphs on millions of cores in seconds.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Torsten Hoefler

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2018

Designing scalable FPGA architectures using high-level synthesis.

[BibT_eX]

[DOI]

Johannes de Fine Licht

,

,

Torsten Hoefler

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Communication-avoiding parallel minimum cuts and connected components.

[BibT_eX]

[DOI]

Lukas Gianinazzi

,

,

Alessandro De Palma

,

,

Torsten Hoefler

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Neural Code Comprehension: A Learnable Representation of Code Semantics.

[BibT_eX]

[DOI]

,

Alice Shoshana Jakobovits

,

Torsten Hoefler

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Convergence of Sparsified Gradient Methods.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Mikael Johansson

,

Nikola Konstantinov

,

,

Cédric Renggli

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Reproducible Floating-Point Aggregation in RDBMSs.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the 34th IEEE International Conference on Data Engineering, 2018

Fast and strongly-consistent per-item resilience in key-value stores.

[BibT_eX]

[DOI]

Konstantin Taranov

,

,

Torsten Hoefler

Proceedings of the Thirteenth EuroSys Conference, 2018

Accelerating Deep Learning Frameworks with Micro-Batches.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Lightweight Requirements Engineering for Exascale Co-design.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

,

Torsten Hoefler

,

,

Sebastian Rinke

,

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability.

[BibT_eX]

[DOI]

,

Syed Minhaj Hassan

,

Sudhakar Yalamanchili

,

Rachata Ausavarungnirun

,

,

Torsten Hoefler

Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

Log(graph): a near-optimal high-performance graph representation.

[BibT_eX]

[DOI]

,

Dimitri Stanojevic

,

,

,

Maurice Hoerold

,

Torsten Hoefler

Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017

Trends in Data Locality Abstractions for HPC Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2017

Distributed Join Algorithms on Thousands of Cores.

[BibT_eX]

[DOI]

Claude Barthels

,

,

Torsten Hoefler

,

,

Proc. VLDB Endow., 2017

Designing Databases for Future High-Performance Networks.

[BibT_eX]

[DOI]

Claude Barthels

,

,

Torsten Hoefler

IEEE Data Eng. Bull., 2017

A Communication-Avoiding Parallel Algorithm for the Symmetric Eigenvalue Problem.

[BibT_eX]

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures, 2017

Scaling betweenness centrality using communication-efficient sparse matrix multiplication.

[BibT_eX]

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2017

sPIN: high-performance streaming processing in the network.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Salvatore Di Girolamo

,

Konstantin Taranov

,

,

Proceedings of the International Conference for High Performance Computing, 2017

Isoefficiency in Practice: Configuring and Understanding the Performance of Task-based Applications.

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core Architectures.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Communication-Avoiding Parallel Algorithms for Solving Triangular Systems of Linear Equations.

[BibT_eX]

[DOI]

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

IPDRM Workshop Introduction.

[BibT_eX]

[DOI]

Shuaiwen Leon Song

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Capability Models for Manycore Memory Systems: A Case-Study with Xeon Phi KNL.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Corrected Gossip Algorithms for Fast Reliable Broadcast on Unreliable Systems.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

EMBRACE Keynote.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Transparent Caching for RMA Systems.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

SlimSell: A Vectorizable Graph Representation for Breadth-First Search.

[BibT_eX]

[DOI]

,

Florian Marending

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Model-Driven Choice of Numerical Methods for the Solution of the Linear Advection Equation.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Thomas C. Schulthess

Proceedings of the International Conference on Computational Science, 2017

AllConcur: Leaderless Concurrent Atomic Broadcast.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations.

[BibT_eX]

[DOI]

,

Michal Podstawski

,

,

Edgar Solomonik

,

Torsten Hoefler

Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing, 2017

An Effective Queuing Scheme to Provide Slim Fly Topologies with HoL Blocking Reduction and Deadlock Freedom for Minimal-Path Routing.

[BibT_eX]

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

Improving Non-minimal and Adaptive Routing Algorithms in Slim Fly Networks.

[BibT_eX]

[DOI]

,

Jesús Escudero-Sahuquillo

,

Pedro Javier García

,

Francisco J. Quiles

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.

[BibT_eX]

[DOI]

,

,

,

Keith D. Underwood

,

Torsten Hoefler

Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

Multi-agent Pathfinding with n Agents on Graphs with n Vertices: Combinatorial Classification and Tight Algorithmic Bounds.

[BibT_eX]

[DOI]

Klaus-Tycho Foerster

,

,

Torsten Hoefler

,

,

,

Roger Wattenhofer

Proceedings of the Algorithms and Complexity - 10th International Conference, 2017

2016

Automatic Performance Modeling of HPC Applications.

[BibT_eX]

[DOI]

,

Christian H. Bischof

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Christian Iwainsky

,

Grzegorz Kwasniewski

,

,

,

Alexandre Strube

,

,

Proceedings of the Software for Exascale Computing - SPPEXA 2013-2015, 2016

Cache Line Aware Algorithm Design for Cache-Coherent Architectures.

[BibT_eX]

[DOI]

,

Torsten Hoefler

IEEE Trans. Parallel Distributed Syst., 2016

Exploiting Offload-Enabled Network Interfaces.

[BibT_eX]

[DOI]

Salvatore Di Girolamo

,

,

Keith D. Underwood

,

Torsten Hoefler

IEEE Micro, 2016

On noise and the performance benefit of nonblocking collectives.

[BibT_eX]

[DOI]

Patrick M. Widener

,

,

Kurt B. Ferreira

,

Torsten Hoefler

Int. J. High Perform. Comput. Appl., 2016

Betweenness Centrality is more Parallelizable than Dense Matrix Multiplication.

[BibT_eX]

[DOI]

Edgar Solomonik

,

,

,

Torsten Hoefler

CoRR, 2016

AllConcur: Leaderless Concurrent Atomic Broadcast (Extended Version).

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

CoRR, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.

[BibT_eX]

[DOI]

William M. Tang

,

,

Stéphane Ethier

,

Grzegorz Kwasniewski

,

Torsten Hoefler

,

Khaled Z. Ibrahim

,

,

Samuel Williams

,

,

Carlos Rosales-Fernandez

,

Timothy J. Williams

Proceedings of the International Conference for High Performance Computing, 2016

A PCIe congestion-aware performance model for densely populated accelerator servers.

[BibT_eX]

[DOI]

Maxime Martinasso

,

Grzegorz Kwasniewski

,

,

Thomas C. Schulthess

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

dCUDA: hardware supported overlap of computation and communication.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

Scheduling-aware routing for supercomputers.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2016

Selecting Technical Papers for an Interdisciplinary Conference: The PASC Review Process.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the Platform for Advanced Scientific Computing Conference, 2016

Modeling and analysis of remote memory access programming.

[BibT_eX]

[DOI]

Andrei Marian Dan

,

,

Torsten Hoefler

,

Martin T. Vechev

Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016

Polly-ACC Transparent compilation to heterogeneous hardware.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 2016 International Conference on Supercomputing, 2016

SDNsec: Forwarding Accountability for the SDN Data Plane.

[BibT_eX]

[DOI]

Takayuki Sasaki

,

Christos Pappas

,

,

Torsten Hoefler

,

Proceedings of the 25th International Conference on Computer Communication and Networks, 2016

High-Performance Distributed RMA Locks.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

Fast Multi-parameter Performance Modeling.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

David Beckingsale

,

Christopher W. Earl

,

Torsten Hoefler

,

,

,

Proceedings of the 2016 IEEE International Conference on Cluster Computing, 2016

2015

Remote Memory Access Programming in MPI-3.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

,

,

,

Keith D. Underwood

ACM Trans. Parallel Comput., 2015

Introduction to the Special Issue on SPAA 2013.

[BibT_eX]

[DOI]

,

Torsten Hoefler

ACM Trans. Parallel Comput., 2015

Sparse Tensor Algebra as a Parallel Programming Model.

[BibT_eX]

[DOI]

Edgar Solomonik

,

Torsten Hoefler

CoRR, 2015

Cost-effective diameter-two topologies: analysis and evaluation.

[BibT_eX]

[DOI]

Georgios Kathareios

,

Cyriel Minkenberg

,

Bogdan Prisacari

,

Germán Rodríguez

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2015

Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the International Conference for High Performance Computing, 2015

HIPS-LSPP Keynotes.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Laxmikant V. Kalé

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Notified Access: Extending Remote Memory Access Programming Models for Producer-Consumer Synchronization.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Exascaling Your Library: Will Your Implementation Meet Your Expectations?

[BibT_eX]

[DOI]

,

Alexandru Calotoiu

,

Torsten Hoefler

,

Alexandre Strube

,

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous Architectures.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Cache Line Aware Optimizations for ccNUMA Systems.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

DARE: High-Performance State Machine Replication on RDMA Networks.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Accelerating Irregular Computations with Hardware Transactional Memory and Active Messages.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Distributing the Data Plane for Remote Storage Access.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Proceedings of the 15th Workshop on Hot Topics in Operating Systems, 2015

Source-Based Path Selection: The Data Plane Perspective.

[BibT_eX]

[DOI]

,

Christos Pappas

,

Cristina Basescu

,

,

Torsten Hoefler

,

Proceedings of the 10th International Conference on Future Internet, 2015

Evaluating the Cost of Atomic Operations on Modern Architectures.

[BibT_eX]

[DOI]

Hermann Schweizer

,

,

Torsten Hoefler

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Using Compiler Techniques to Improve Automatic Performance Modeling.

[BibT_eX]

[DOI]

Arnamoy Bhattacharyya

,

Grzegorz Kwasniewski

,

Torsten Hoefler

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Energy, Memory, and Runtime Tradeoffs for Implementing Collective Communication Operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Supercomput. Front. Innov., 2014

Enabling highly-scalable remote memory access programming with MPI-3 One Sided.

[BibT_eX]

[DOI]

Robert Gerstenberger

,

,

Torsten Hoefler

Sci. Program., 2014

Application-oriented ping-pong benchmarking: how to assess the real communication overheads.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Computing, 2014

Improved MPI collectives for MPI processes in shared address spaces.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

,

Clust. Comput., 2014

Automatic complexity analysis of explicitly parallel programs.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Grzegorz Kwasniewski

Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

Understanding the Effects of Communication and Coordination on Checkpointing at Scale.

[BibT_eX]

[DOI]

Kurt B. Ferreira

,

Patrick M. Widener

,

,

Dorian C. Arnold

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2014

Fail-in-Place Network Design: Interaction Between Topology, Routing Algorithm and Failures.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Satoshi Matsuoka

Proceedings of the International Conference for High Performance Computing, 2014

Slim Fly: A Cost Effective Low-Diameter Network Topology.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the International Conference for High Performance Computing, 2014

Exploring the effect of noise on the performance benefit of nonblocking allreduce.

[BibT_eX]

[DOI]

Patrick M. Widener

,

Kurt B. Ferreira

,

,

Torsten Hoefler

Proceedings of the 21st European MPI Users' Group Meeting, 2014

Designing Bit-Reproducible Portable High-Performance Applications.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks.

[BibT_eX]

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Philip Heidelberger

,

,

Cyriel Minkenberg

,

Torsten Hoefler

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Fault tolerance for remote memory access programming models.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

Catwalk: A Quick Development Path for Performance Models.

[BibT_eX]

[DOI]

,

Christian H. Bischof

,

Torsten Hoefler

,

,

,

Alexandru Calotoiu

,

Christian Iwainsky

,

Alexandre Strube

,

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

PEMOGEN: automatic adaptive performance modeling during program runtime.

[BibT_eX]

[DOI]

Arnamoy Bhattacharyya

,

Torsten Hoefler

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Fast pattern-specific routing for fat tree networks.

[BibT_eX]

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Cyriel Minkenberg

,

Torsten Hoefler

ACM Trans. Archit. Code Optim., 2013

Operating systems and runtime environments on supercomputers.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Int. J. High Perform. Comput. Appl., 2013

MPI + MPI: a new hybrid approach to parallel programming with MPI plus shared memory.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

,

,

,

,

Computing, 2013

Using Simulation to Evaluate the Performance of Resilience Strategies at Scale.

[BibT_eX]

[DOI]

,

,

Kurt B. Ferreira

,

Dorian C. Arnold

,

Torsten Hoefler

,

Patrick M. Widener

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Hybrid MPI: efficient message passing for multi-core systems.

[BibT_eX]

[DOI]

Andrew Friedley

,

Greg Bronevetsky

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the International Conference for High Performance Computing, 2013

Using automated performance modeling to find scalability bugs in complex codes.

[BibT_eX]

[DOI]

Alexandru Calotoiu

,

Torsten Hoefler

,

,

Proceedings of the International Conference for High Performance Computing, 2013

MPI datatype processing using runtime compilation.

[BibT_eX]

[DOI]

,

Fredrik Kjolstad

,

Torsten Hoefler

Proceedings of the 20th European MPI Users's Group Meeting, 2013

Ownership passing: efficient distributed memory programming on multi-core systems.

[BibT_eX]

[DOI]

Andrew Friedley

,

Torsten Hoefler

,

Greg Bronevetsky

,

Andrew Lumsdaine

,

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Compiler Optimizations for Non-contiguous Remote Data Movement.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Proceedings of the Languages and Compilers for Parallel Computing, 2013

Bandwidth-optimal all-to-all exchanges in fat tree networks.

[BibT_eX]

[DOI]

Bogdan Prisacari

,

Germán Rodríguez

,

Cyriel Minkenberg

,

Torsten Hoefler

Proceedings of the International Conference on Supercomputing, 2013

Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

,

Brian W. Barrett

,

Proceedings of the 42nd International Conference on Parallel Processing, 2013

Modeling communication in cache-coherent SMP systems: a case-study with Xeon Phi.

[BibT_eX]

[DOI]

,

Torsten Hoefler

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

NUMA-aware shared-memory collective communication for MPI.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

Topic 13: High-Performance Networks and Communication - (Introduction).

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

,

Davide Bertozzi

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

2012

Extensions for next-generation parallel programming models.

[BibT_eX]

[DOI]

Torsten Hoefler

Parallel Comput., 2012

Top Picks from Hot Interconnects 2011: Petascale Network Architectures.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Patrick Geoffray

,

Fabrizio Petrini

,

Jesper Larsson Träff

IEEE Micro, 2012

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Bronis R. de Supinski

,

William D. Gropp

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Optimization principles for collective neighborhood communications.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Micro-applications for Communication Data Access Patterns and MPI Datatypes.

[BibT_eX]

[DOI]

,

Robert Gerstenberger

,

Torsten Hoefler

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Exact Dependence Analysis for Increased Communication Overlap.

[BibT_eX]

[DOI]

Simone Pellegrini

,

Torsten Hoefler

,

Thomas Fahringer

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Darius Buntinas

,

,

Brian W. Barrett

,

,

,

,

Proceedings of the Recent Advances in the Message Passing Interface, 2012

Automatic datatype generation and optimization.

[BibT_eX]

[DOI]

Fredrik Kjolstad

,

Torsten Hoefler

,

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Communication-centric optimizations by dynamically detecting collective operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Assessing HPC Failure Detectors for MPI Jobs.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the 20th Euromicro International Conference on Parallel, 2012

On the Effects of CPU Caches on MPI Point-to-Point Communications.

[BibT_eX]

[DOI]

Simone Pellegrini

,

Torsten Hoefler

,

Thomas Fahringer

Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Productive Parallel Linear Algebra Programming with Unstructured Topology Adaption.

[BibT_eX]

[DOI]

Peter Gottschling

,

Torsten Hoefler

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Performance Modeling and Comparative Analysis of the MILC Lattice QCD Application su3_rmd.

[BibT_eX]

[DOI]

,

Steven Gottlieb

,

Torsten Hoefler

Proceedings of the 12th IEEE/ACM International Symposium on Cluster, 2012

Runtime detection and optimization of collective communication patterns.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Mpi on millions of Cores.

[BibT_eX]

[DOI]

,

Darius Buntinas

,

,

,

Torsten Hoefler

,

,

,

,

Jesper Larsson Träff

Parallel Process. Lett., 2011

The scalable process topology interface of MPI 2.2.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Rolf Rabenseifner

,

Hubert Ritzdorf

,

Bronis R. de Supinski

,

,

Jesper Larsson Träff

Concurr. Comput. Pract. Exp., 2011

Methods of creating student cluster competition teams.

[BibT_eX]

[DOI]

Stephen Lien Harrell

,

Preston M. Smith

,

,

Torsten Hoefler

,

Anna A. Labutina

,

Trinity Overmyer

Proceedings of the 2011 TeraGrid Conference - Extreme Digital Discovery, 2011

Performance modeling for systematic performance tuning.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Design and Evaluation of Nonblocking Collective I/O Operations.

[BibT_eX]

[DOI]

Vishwanath Venkatesan

,

Mohamad Chaarawi

,

,

Torsten Hoefler

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Performance Expectations and Guidelines for MPI Derived Datatypes.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

,

Jesper Larsson Träff

Proceedings of the Recent Advances in the Message Passing Interface, 2011

Active pebbles: a programming model for highly parallel fine-grained data-driven computations.

[BibT_eX]

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

Kanor - A Declarative Language for Explicit Communication.

[BibT_eX]

[DOI]

,

William E. Byrd

,

Jeremiah Willcock

,

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the Practical Aspects of Declarative Languages, 2011

HIPS Introduction.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Deadlock-Free Oblivious Routing for Arbitrary Topologies.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Wolfgang E. Nagel

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Active pebbles: parallel programming for data-driven applications.

[BibT_eX]

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Generic topology mapping strategies for large-scale parallel architectures.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010

Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Int. J. Parallel Emergent Distributed Syst., 2010

Software and Hardware Techniques for Power-Efficient HPC Networking.

[BibT_eX]

[DOI]

Torsten Hoefler

Comput. Sci. Eng., 2010

Characterizing the Influence of System Noise on Large-Scale Applications by Simulation.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the Conference on High Performance Computing Networking, 2010

Toward Performance Models of MPI Implementations for Understanding Application Scaling Issues.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Jesper Larsson Träff

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Steven Gottlieb

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Efficient MPI Support for Advanced Hybrid Programming Models.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Greg Bronevetsky

,

,

Bronis R. de Supinski

,

Andrew Lumsdaine

Proceedings of the Recent Advances in the Message Passing Interface, 2010

Scalable communication protocols for dynamic sparse data exchange.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Andrew Lumsdaine

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

LogGOPSim: simulating large-scale applications in the LogGOPS model.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

The PERCS High-Performance Interconnect.

[BibT_eX]

[DOI]

L. Baba Arimilli

,

,

,

,

Wolfgang E. Denzel

,

,

Torsten Hoefler

,

,

,

,

,

Ramakrishnan Rajamony

Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

A space-efficient parallel algorithm for computing betweenness centrality in distributed memory.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 2010 International Conference on High Performance Computing, 2010

Bridging Performance Analysis Tools and Analytic Performance Modeling for HPC.

[BibT_eX]

[DOI]

Torsten Hoefler

Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

AM++: a generalized active message framework.

[BibT_eX]

[DOI]

Jeremiah Willcock

,

Torsten Hoefler

,

Nicholas Gerard Edmonds

,

Andrew Lumsdaine

Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009

LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Simul. Model. Pract. Theory, 2009

The Effect of Network Noise on Large-Scale Collective Communications.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Parallel Process. Lett., 2009

Towards Efficient MapReduce Using MPI.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

,

Jack J. Dongarra

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2009

Implementation and analysis of nonblocking collective operations on SCI networks.

[BibT_eX]

[DOI]

Christian Kaiser

,

Torsten Hoefler

,

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Sparse collective operations for MPI.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Jesper Larsson Träff

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

The impact of network noise at large-scale communication performance.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A power-aware, application-based performance study of modern commodity cluster interconnection networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Group Operation Assembly Language - A Flexible Way to Express Collective Communication.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Andrew Lumsdaine

Proceedings of the ICPP 2009, 2009

Optimized Routing for Large-Scale InfiniBand Networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

Demand-driven execution of static directed acyclic graphs using task parallelism.

[BibT_eX]

[DOI]

Prabhanjan Kambadur

,

,

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 16th International Conference on High Performance Computing, 2009

2008

Leveraging non-blocking collective communication in high-performance applications.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Peter Gottschling

,

Andrew Lumsdaine

Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

Communication Optimization for Medical Image Reconstruction Algorithms.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Maraike Schellmann

,

Sergei Gorlatch

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Sparse Non-blocking Collectives in Quantum Mechanical Calculations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Florian Lorenzen

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2008

Accurately measuring collective operations at massive scale.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Optimizing non-blocking collective operations for infiniband.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Adaptive Routing Strategies for Modern High Performance Networks.

[BibT_eX]

[DOI]

Patrick Geoffray

,

Torsten Hoefler

Proceedings of the 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 2008

Multistage switches are not crossbars: Effects of static routing in high-performance networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Message progression in parallel computing - to thread or not to thread?

[BibT_eX]

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

Overlapping Communication and Computation with High Level Communication Routines.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

An Optimized ZGEMM Implementation for the Cell BE.

[BibT_eX]

[DOI]

,

Torsten Hoefler

,

Simon Wunderlich

,

,

Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008

2007

Optimizing a conjugate gradient solver with non-blocking collective operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Peter Gottschling

,

Andrew Lumsdaine

,

Parallel Comput., 2007

Implementation and performance analysis of non-blocking collective operations for MPI.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Andrew Lumsdaine

,

Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

A Case for Standard Non-blocking Collective Operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Prabhanjan Kambadur

,

Richard L. Graham

,

Galen M. Shipman

,

Andrew Lumsdaine

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 14th European PVM/MPI User's Group Meeting, Paris, France, September 30, 2007

A practically constant-time MPI Broadcast Algorithm for large-scale InfiniBand Clusters with Multicast.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Christian Siebert

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Low-Overhead LogGP Parameter Assessment for Modern Interconnection Networks.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Netgauge: A Network Performance Measurement Framework.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

Andrew Lumsdaine

,

Proceedings of the High Performance Computing and Communications, 2007

2006

IRS - A Portable Interface for Reconfigurable Systems.

[BibT_eX]

[DOI]

,

,

Torsten Hoefler

,

,

Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006

Assessing Single-Message and Multi-Node Communication Performance of InfiniBand.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Carsten Viertel

,

,

,

Proceedings of the Fifth International Conference on Parallel Computing in Electrical Engineering (PARELEC 2006), 2006

A Case for Non-blocking Collective Operations.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Jeffrey M. Squyres

,

,

Andrew Lumsdaine

Proceedings of the Frontiers of High Performance Computing and Networking, 2006

LogfP - a model for small messages in InfiniBand.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Fast barrier synchronization for InfiniBand™.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Analysis of the Memory Registration Process in the Mellanox InfiniBand Software Stack.

[BibT_eX]

[DOI]

,

,

Robert Baumgartl

,

,

Torsten Hoefler

,

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Adding Low-Cost Hardware Barrier Support to Small Commodity Clusters.

[BibT_eX]

[DOI]

Torsten Hoefler

,

,

,

Proceedings of the ARCS 2006, 2006

2005

A Practical Approach to the Rating of Barrier Algorithms Using the LogP Model and Open MPI.

[BibT_eX]

[DOI]

Torsten Hoefler

,

Lavinio Cerquetti

,

,

,

Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005

Loading...