Timo Schneider

Orcid: 0000-0002-4884-3934

According to our database1, Timo Schneider authored at least 52 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
RED-SEA Project: Towards a new-generation European interconnect.
Microprocess. Microsystems, 2024

FPsPIN: An FPGA-based Open-Hardware Research Platform for Processing in the Network.
CoRR, 2024

LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming.
CoRR, 2024

SMaRTT-REPS: Sender-based Marked Rapidly-adapting Trimmed & Timed Transport with Recycled Entropies.
CoRR, 2024

OSMOSIS: Enabling Multi-Tenancy in Datacenter SmartNICs.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

2023
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs.
Proceedings of the International Conference for High Performance Computing, 2023

2022
Deinsum: Practically I/O Optimal Multilinear Algebra.
CoRR, 2022

Deinsum: Practically I/O Optimal Multi-Linear Algebra.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Building Blocks for Network-Accelerated Distributed File Systems.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

Lifting C semantics for dataflow optimization.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022


Assessing the Complexity of DC-System Simulations.
Proceedings of the IEEE Workshop on Complexity in Engineering, 2022

2021
High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC Networks.
IEEE Trans. Parallel Distributed Syst., 2021

Pebbles, Graphs, and a Pinch of Combinatorics: Towards Tight I/O Lower Bounds for Statically Analyzable Programs.
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021

Productivity, portability, performance: data-centric Python.
Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal matrix factorizations.
Proceedings of the International Conference for High Performance Computing, 2021

On the parallel I/O optimality of linear algebra kernels: near-optimal LU factorization.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

A RISC-V in-network accelerator for flexible high-performance low-power packet processing.
Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

NPBench: a benchmarking suite for high-performance NumPy.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

2020
PsPIN: A high-performance low-power architecture for flexible in-network compute.
CoRR, 2020

High-Performance Routing with Multipathing and Path Diversity in Supercomputers and Data Centers.
CoRR, 2020

Communication and Timing Issues with MPI Virtualization.
Proceedings of the EuroMPI/USA '20: 27th European MPI Users' Group Meeting, 2020

2019
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations.
CoRR, 2019

Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs.
CoRR, 2019

Optimizing the data movement in quantum transport simulations via data-centric parallel programming.
Proceedings of the International Conference for High Performance Computing, 2019

A data-centric approach to extreme-scale <i>ab initio</i> dissipative quantum transport simulations.
Proceedings of the International Conference for High Performance Computing, 2019

Network-accelerated non-contiguous memory transfers.
Proceedings of the International Conference for High Performance Computing, 2019

Stateful dataflow multigraphs: a data-centric model for performance portability on heterogeneous architectures.
Proceedings of the International Conference for High Performance Computing, 2019

2017
Distributed Join Algorithms on Thousands of Cores.
Proc. VLDB Endow., 2017

Fast Networks and Slow Memories: A Mechanism for Mitigating Bandwidth Mismatches.
Proceedings of the 25th IEEE Annual Symposium on High-Performance Interconnects, 2017

2016
Ensuring Deadlock-Freedom in Low-Diameter InfiniBand Networks.
Proceedings of the 24th IEEE Annual Symposium on High-Performance Interconnects, 2016

2014
Application-oriented ping-pong benchmarking: how to assess the real communication overheads.
Computing, 2014

Manifold Alignment for Person Independent Appearance-Based Gaze Estimation.
Proceedings of the 22nd International Conference on Pattern Recognition, 2014

2013
MPI datatype processing using runtime compilation.
Proceedings of the 20th European MPI Users's Group Meeting, 2013

Compiler Optimizations for Non-contiguous Remote Data Movement.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Protocols for Fully Offloaded Collective Operations on Accelerated Network Adapters.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

2012
Optimization principles for collective neighborhood communications.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Micro-applications for Communication Data Access Patterns and MPI Datatypes.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

Communication-centric optimizations by dynamically detecting collective operations.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

Runtime detection and optimization of collective communication patterns.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Kernel-Based Offload of Collective Operations - Implementation, Evaluation and Lessons Learned.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
Accurately measuring overhead, communication time and progression of blocking and nonblocking collective operations at massive scale.
Int. J. Parallel Emergent Distributed Syst., 2010

Characterizing the Influence of System Noise on Large-Scale Applications by Simulation.
Proceedings of the Conference on High Performance Computing Networking, 2010

LogGOPSim: simulating large-scale applications in the LogGOPS model.
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, 2010

2009
LogGP in theory and practice - An in-depth analysis of modern interconnection networks and benchmarking methods for collective operations.
Simul. Model. Pract. Theory, 2009

The Effect of Network Noise on Large-Scale Collective Communications.
Parallel Process. Lett., 2009

The impact of network noise at large-scale communication performance.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A power-aware, application-based performance study of modern commodity cluster interconnection networks.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Optimized Routing for Large-Scale InfiniBand Networks.
Proceedings of the 17th IEEE Symposium on High Performance Interconnects, 2009

2008
Accurately measuring collective operations at massive scale.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Multistage switches are not crossbars: Effects of static routing in high-performance networks.
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008

An Optimized ZGEMM Implementation for the Cell BE.
Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA) held at the 21st Conference on the Architecture of Computing Systems (ARCS), 2008


  Loading...