Holger Fröning

Orcid: 0000-0001-9562-0680

  • Heidelberg University, Institute of Computer Engineering, Germany

According to our database1, Holger Fröning authored at least 104 papers between 2002 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


On Hardening DNNs against Noisy Computations.
CoRR, January, 2025

GraphScale: Scalable Processing on FPGAs for HBM and Large Graphs.
ACM Trans. Reconfigurable Technol. Syst., June, 2024

Resource-Efficient Neural Networks for Embedded Systems.
J. Mach. Learn. Res., 2024

Function Space Diversity for Uncertainty Prediction via Repulsive Last-Layer Ensembles.
CoRR, 2024

Less Memory Means smaller GPUs: Backpropagation with Compressed Activations.
CoRR, 2024

DeepHYDRA: Resource-Efficient Time-Series Anomaly Detection in Dynamically-Configured Systems.
CoRR, 2024

GraphMatch: Subgraph Query Processing on FPGAs.
CoRR, 2024

Walking Noise: On Layer-Specific Robustness of Neural Architectures Against Noisy Computations and Associated Characteristic Learning Dynamics.
Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track, 2024

DeepHYDRA: A Hybrid Deep Learning and DBSCAN-Based Approach to Time-Series Anomaly Detection in Dynamically-Configured Systems.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

CLAIRE-ROP: Rapid Partitioning-based Deformable Image Registration on Multi-GPU Accelerator.
Proceedings of the 2024 8th International Conference on Medical and Health Informatics, 2024

Non-relational Databases on FPGAs: Survey, Design Decisions, Challenges.
ACM Comput. Surv., November, 2023

Compressing the Backward Pass of Large-Scale Neural Architectures by Structured Activation Pruning.
CoRR, 2023

On Performance Analysis of Graphcore IPUs: Analyzing Squared and Skewed Matrix Multiplication.
CoRR, 2023

Characterization of data compression across CPU platforms and accelerators.
Concurr. Comput. Pract. Exp., 2023

Reducing Memory Requirements for the IPU using Butterfly Factorizations.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

On the Non-associativity of Analog Computations.
Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Implications of Noise in Resistive Memory on Deep Neural Networks for Image Classification.
Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023

Implementation Techniques for SPMD Kernels on CPUs.
Proceedings of the 2023 International Workshop on OpenCL, 2023

CUDAsap: Statically-Determined Execution Statistics as Alternative to Execution-Based Profiling.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming.
ACM Trans. Archit. Code Optim., 2022

Walking Noise: Understanding Implications of Noisy Computations on Classification Tasks.
CoRR, 2022

Towards Hardware-Specific Automatic Compression of Neural Networks.
CoRR, 2022

HW-Aware Initialization of DNN Auto-Tuning to Improve Exploration Time and Robustness.
CoRR, 2022

Compiler-aided nd-range parallel-for implementations on CPU in hipSYCL.
Proceedings of the IWOCL'22: International Workshop on OpenCL, Bristol, United Kingdom, May 10, 2022

GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

PipeJSON: Parsing JSON at Line Speed on FPGAs.
Proceedings of the International Conference on Management of Data, 2022

A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels.
ACM Trans. Archit. Code Optim., 2021

Scheduling of Graph Queries: Controlling Intra- and Inter-query Parallelism for a High System Throughput.
CoRR, 2021

The Programming of Deep Learning Accelerators as a Constraint Satisfaction Problem.
CoRR, 2021

Understanding Cache Boundness of ML Operators on ARM Processors.
CoRR, 2021

Demystifying memory access patterns of FPGA-based graph processing accelerators.
Proceedings of the GRADES-NDA '21: Proceedings of the 4th ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), 2021

Towards Addressing Noise and Static Variations of Analog Computations Using Efficient Retraining.
Proceedings of the Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021

Exploring Memory Access Patterns for Graph Processing Accelerators.
Proceedings of the Datenbanksysteme für Business, 2021

cCUDA: Effective Co-Scheduling of Concurrent Kernels on GPUs.
IEEE Trans. Parallel Distributed Syst., 2020

Resource-Efficient Speech Mask Estimation for Multi-Channel Speech Enhancement.
CoRR, 2020

Resource-Efficient Neural Networks for Embedded Systems.
CoRR, 2020

On the Difficulty of Designing Processor Arrays for Deep Neural Networks.
Proceedings of the IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, 2020

Search Space Complexity of Iteration Domain Based Instruction Embedding for Deep Learning Accelerators.
Proceedings of the IoT Streams for Data-Driven Predictive Maintenance and IoT, Edge, and Mobile for Embedded Machine Learning, 2020

Parameterized Structured Pruning for Deep Neural Networks.
Proceedings of the Machine Learning, Optimization, and Data Science, 2020

On Resource-Efficient Bayesian Network Classifiers and Deep Neural Networks.
Proceedings of the 25th International Conference on Pattern Recognition, 2020

Assessing the Overhead of Offloading Compression Tasks.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

Automated Partitioning of Data-Parallel Kernels using Polyhedral Compilation.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020

On Network Locality in MPI-Based HPC Applications.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

Towards Real-Time Single-Channel Singing-Voice Separation with Pruned Multi-Scaled Densenets.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Metric Selection for GPU Kernel Classification.
ACM Trans. Archit. Code Optim., 2019

On link width scaling for energy-proportional direct interconnection networks.
Concurr. Comput. Pract. Exp., 2019

Constructing virtual 5-dimensional tori out of lower-dimensional network cards.
Concurr. Comput. Pract. Exp., 2019

CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Training Discrete-Valued Neural Networks with Sign Activations Using Weight Distributions.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2019

Software-Based Buffering of Associative Operations on Random Memory Addresses.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

Effects of Congestion Management on Energy Saving Techniques in Interconnection Networks.
Proceedings of the 5th International Workshop on High-Performance Interconnection Networks in the ExaScale and Big-Data Era, 2019

Quantifying the NUMA Behavior of Partitioned GPGPU Applications.
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2019

Efficient and Robust Machine Learning for Real-World Systems.
CoRR, 2018

Heterogeneous and unconventional cluster architectures and applications.
Concurr. Comput. Pract. Exp., 2018

Towards Efficient Forward Propagation on Resource-Constrained Systems.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2018

Resource Efficient Deep Eigenvector Beamforming.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Evaluating Energy-Saving Strategies on Torus, K-Ary N-Tree, and Dragonfly.
Proceedings of the 4th IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2018

Buffer Provisioning for Large-Scale Data-Acquisition Systems.
Proceedings of the 12th ACM International Conference on Distributed and Event-based Systems, 2018

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU.
Int. J. High Perform. Comput. Appl., 2017

An Overview of MPI Characteristics of Exascale Proxy Applications.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Modeling and Validating Time, Buffering, and Utilization of a Large-Scale, Real-Time Data Acquisition System.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Early Experiences with Saving Energy in Direct Interconnection Networks.
Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

A Case Study on Implementing Virtual 5D Torus Networks Using Network Components of Lower Dimensionality.
Proceedings of the 3rd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era, 2017

Can Modern Graph Processing Engines Run Concurrent Queries Efficiently?
Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, 2017

Linking Application Description with Efficient SIMD Code Generation for Low-Precision Signed-Integer GEMM.
Proceedings of the Euro-Par 2017: Parallel Processing Workshops, 2017

Optimizing the data-collection time of a large-scale data-acquisition system through a simulation framework.
J. Supercomput., 2016

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy.
Parallel Comput., 2016

Heterogeneous cluster architectures and applications.
Concurr. Comput. Pract. Exp., 2016

SONAR: Automated Communication Characterization for HPC Applications.
Proceedings of the High Performance Computing, 2016

Exploring Time and Energy for Complex Accesses to a Hybrid Memory Cube.
Proceedings of the Second International Symposium on Memory Systems, 2016

Optimizing communication for a 2D-partitioned scalable BFS.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

Analyzing the Energy (Dis-) Proportionality of Scalable Interconnection Networks.
Proceedings of the 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era HiPINEB@HPCA 2016, 2016

On the design of a new dynamic credit-based end-to-end flow control mechanism for HPC clusters.
Parallel Comput., 2015

Analyzing communication models for distributed thread-collaborative processors in terms of energy and time.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Highspeed Graph Processing Exploiting Main-Memory Column Stores.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

Modeling a Large Data-Acquisition Network in a Simulation Framework.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Special issue on unconventional cluster architectures and applications.
Clust. Comput., 2014

Energy-efficient stencil computations on distributed GPUs using dynamic parallelism and GPU-controlled communication.
Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014

Infiniband-Verbs on GPU: A Case Study of Controlling an Infiniband Network Device from the GPU.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Analyzing Put/Get APIs for Thread-Collaborative Processors.
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014

Energy-Efficient Collective Reduce and Allreduce Operations on Distributed GPUs.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Data Movement Options in Accelerated Clusters.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Oncilla: A GAS runtime for efficient resource allocation and data movement in accelerated clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

GGAS: Global GPU address spaces for efficient communication in heterogeneous clusters.
Proceedings of the 2013 IEEE International Conference on Cluster Computing, 2013

On Achieving High Message Rates.
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

A new degree of freedom for memory allocation in clusters.
Clust. Comput., 2012

A New End-to-End Flow-Control Mechanism for High Performance Computing Clusters.
Proceedings of the 2012 IEEE International Conference on Cluster Computing, 2012

Network Interfaces.
Proceedings of the Encyclopedia of Parallel Computing, 2011

MEMSCALE<sup>TM</sup>: A Scalable Environment for Databases.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Unleash Your Memory-Constrained Applications: A 32-Node Non-coherent Distributed-Memory Prototype Cluster.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Highly scalable barriers for future high-performance computing clusters.
Proceedings of the 18th International Conference on High Performance Computing, 2011

MEMSCALE: in-cluster-memory databases.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011

Efficient hardware support for the Partitioned Global Address Space.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Getting Rid of Coherency Overhead for Memory-Hungry Applications.
Proceedings of the 2010 IEEE International Conference on Cluster Computing, 2010

An FPGA-Based Custom High Performance Interconnection Network.
Proceedings of the ReConFig'09: 2009 International Conference on Reconfigurable Computing and FPGAs, 2009

Efficient Virtualization of High-Performance Network Interfaces.
Proceedings of the Eighth International Conference on Networks, 2009

An FPGA based verification platform for HyperTransport 3.x.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009

A HyperTransport 3 Physical Layer Interface for FPGAs.
Proceedings of the Reconfigurable Computing: Architectures, 2009

VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Architectural improvements of interconnection network interfaces.
PhD thesis, 2007

Swordfish: A Simulator for High-Performance Networks.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2005

Performance Evaluation of the ATOLL Interconnect.
Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, 2005

ATOLL: Performance and Cost Optimization of a San Interconnect.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2002
