Markus Püschel

Orcid: 0000-0001-8834-8551

  • ETH Zurich, Switzerland

According to our database1, Markus Püschel authored at least 175 papers between 1997 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



Floating-Point TVPI Abstract Domain.
Proc. ACM Program. Lang., 2024

Learning Signals and Graphs from Time-Series Graph Data with Few Causes.
Proceedings of the IEEE International Conference on Acoustics, 2024

Causal Fourier Analysis on Directed Acyclic Graphs and Posets.
IEEE Trans. Signal Process., 2023

The CausalBench challenge: A machine learning contest for gene network inference from single-cell perturbation data.
CoRR, 2023

QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models.
CoRR, 2023

Learning DAGs from Data with Few Root Causes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Möbius Total Variation for Directed Acyclic Graphs.
Proceedings of the IEEE International Conference on Acoustics, 2023

PRIMA: general and precise neural network certification via scalable convex hull approximations.
Proc. ACM Program. Lang., 2022

Fast Möbius and Zeta Transforms.
CoRR, 2022

Fourier Analysis-based Iterative Combinatorial Auctions.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Porting Signal Processing from Undirected to Directed Graphs: Case Study Signal Denoising with Unrolling Networks.
Proceedings of the 30th European Signal Processing Conference, 2022

A Compiler for Sound Floating-Point Computations using Affine Arithmetic.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Digraph Signal Processing With Generalized Boundary Conditions.
IEEE Trans. Signal Process., 2021

Discrete Signal Processing with Set Functions.
IEEE Trans. Signal Process., 2021

Discrete Signal Processing on Meet/Join Lattices.
IEEE Trans. Signal Process., 2021

Precise Multi-Neuron Abstractions for Neural Network Certification.
CoRR, 2021

Scaling Polyhedral Neural Network Verification on GPUs.
Proceedings of the Fourth Conference on Machine Learning and Systems, 2021

Wiener Filter on Meet/Join Lattices.
Proceedings of the IEEE International Conference on Acoustics, 2021

Faster Parallel Training of Word Embeddings.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

An Interval Compiler for Sound Floating-Point Computations.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

Compressive Sensing Using Iterative Hard Thresholding With Low Precision Data Representation: Theory and Applications.
IEEE Trans. Signal Process., 2020

DSL-Based Hardware Generation with Scala: Example Fast Fourier Transforms and Sorting Networks.
ACM Trans. Reconfigurable Technol. Syst., 2020

Neural Network Robustness Verification on GPUs.
CoRR, 2020

Learning fast and precise numerical analysis.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

Diagonalizable Shift and Filters for Directed Graphs Based on the Jordan-Chevalley Decomposition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Finite-Time In-Network Computation of Linear Transforms.
Proceedings of the 54th Asilomar Conference on Signals, Systems, and Computers, 2020

An abstract domain for certifying neural networks.
Proc. ACM Program. Lang., 2019

Program Generation for Linear Algebra Using Multiple Layers of DSLs.
CoRR, 2019

Powerset Convolutional Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Beyond the Single Neuron Convex Barrier for Neural Network Certification.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Boosting Robustness Certification of Neural Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

In Search of the Optimal Walsh-hadamard Transform for Streamed Parallel Processing.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Discrete Signal Processing Framework for Meet/join Lattices with Applications to Hypergraphs and Trees.
Proceedings of the IEEE International Conference on Acoustics, 2019

On Linear Learning with Manycore Processors.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

A stage-polymorphic IR for compiling MATLAB-style dynamic tensor expressions.
Proceedings of the 18th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2019

Sampling Signals On Meet/Join Lattices.
Proceedings of the 2019 IEEE Global Conference on Signal and Information Processing, 2019

DSL-Based Modular IP Core Generators: Example FFT and Related Structures.
Proceedings of the 26th IEEE Symposium on Computer Arithmetic, 2019

SPIRAL: Extreme Performance Portability.
Proc. IEEE, 2018

A practical construction for decomposing numerical abstract domains.
Proc. ACM Program. Lang., 2018

Fast Quantized Arithmetic on x86: Trading Compute for Data Movement.
Proceedings of the 2018 IEEE International Workshop on Signal Processing Systems, 2018

Fast and Effective Robustness Certification.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

A Discrete Signal Processing Framework for Set Functions.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A DSL-Based FFT Hardware Generator in Scala.
Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

Memory-Efficient Fast Fourier Transform on Streaming Data by Fusing Permutations.
Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2018

SIMD intrinsics on managed language runtimes.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Program generation for small-scale linear algebra applications.
Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018

Fast Numerical Program Analysis with Reinforcement Learning.
Proceedings of the Computer Aided Verification - 30th International Conference, 2018

Characterizing and Enumerating Walsh-Hadamard Transform Algorithms.
CoRR, 2017

Fast polyhedra abstract domain.
Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, 2017

Staging for generic programming in space and time.
Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2017

Optimal Streamed Linear Permutations.
Proceedings of the 24th IEEE Symposium on Computer Arithmetic, 2017

Streaming Sorting Networks.
ACM Trans. Design Autom. Electr. Syst., 2016

e-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem.
J. Mach. Learn. Res., 2016

RandIR: differential testing for embedded compilers.
Proceedings of the 7th ACM SIGPLAN Symposium on Scala, 2016

Program generation for performance.
Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, 2016

Optimal Circuits for Streamed Linear Permutations Using RAM.
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016

A basic linear algebra compiler for structured matrices.
Proceedings of the 2016 International Symposium on Code Generation and Optimization, 2016

Distributed Optimization With Local Domains: Applications in MPC and Network Flows.
IEEE Trans. Autom. Control., 2015

Go Meta! A Case for Generative Programming and DSLs in Performance Critical Systems.
Proceedings of the 1st Summit on Advances in Programming Languages, 2015

Making numerical program analysis fast.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

A basic linear algebra compiler for embedded processors.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

High-performance sparse fast Fourier transforms.
Proceedings of the 2014 IEEE Workshop on Signal Processing Systems, 2014

Abstracting Vector Architectures in Library Generators: Case Study Convolution Filters.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Applying the roofline model.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Extending the roofline model: Bottleneck analysis with microarchitectural constraints.
Proceedings of the 2014 IEEE International Symposium on Workload Characterization, 2014

Automatic locality-friendly interface extension of numerical functions.
Proceedings of the Generative Programming: Concepts and Experiences, 2014

A Basic Linear Algebra Compiler.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

D-ADMM: A Communication-Efficient Distributed Algorithm for Separable Optimization.
IEEE Trans. Signal Process., 2013

Active Learning for Multi-Objective Optimization.
Proceedings of the 30th International Conference on Machine Learning, 2013

Spiral in scala: towards the systematic construction of generators for performance libraries.
Proceedings of the Generative Programming: Concepts and Experiences, 2013

Distributed compressed sensing algorithms: Completing the puzzle.
Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

A unified algorithmic approach to distributed optimization.
Proceedings of the IEEE Global Conference on Signal and Information Processing, 2013

Efficient Compression of QRS Complexes Using Hermite Expansion.
IEEE Trans. Signal Process., 2012

Algebraic Signal Processing Theory: 1-D Nearest Neighbor Models.
IEEE Trans. Signal Process., 2012

Distributed Basis Pursuit.
IEEE Trans. Signal Process., 2012

Computer Generation of Hardware for Linear Digital Signal Processing Transforms.
ACM Trans. Design Autom. Electr. Syst., 2012

Compiling math to fast code.
Proceedings of the ACM SIGPLAN 2012 Workshop on Partial Evaluation and Program Manipulation, 2012

"Smart" design space sampling to predict Pareto-optimal solutions.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012

D-ADMM: A distributed algorithm for compressed sensing and other separable optimization problems.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Improving fixed-point accuracy of FFT cores in O-OFDM systems.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Computer generation of streaming sorting networks.
Proceedings of the 49th Annual Design Automation Conference 2012, 2012

ADMM for consensus on colored networks.
Proceedings of the 51th IEEE Conference on Decision and Control, 2012

Distributed ADMM for model predictive control and congestion control.
Proceedings of the 51th IEEE Conference on Decision and Control, 2012

Proceedings of the Encyclopedia of Parallel Computing, 2011

FFT (Fast Fourier Transform).
Proceedings of the Encyclopedia of Parallel Computing, 2011

Algebraic Signal Processing Theory: Cooley-Tukey-Type Algorithms for Polynomial Transforms Based on Induction.
SIAM J. Matrix Anal. Appl., 2011

Automatic performance programming.
Proceedings of the ACM Symposium on New Ideas in Programming and Reflections on Software, 2011

Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Compression of QRS complexes using Hermite expansion.
Proceedings of the IEEE International Conference on Acoustics, 2011

Basis Pursuit in sensor networks.
Proceedings of the IEEE International Conference on Acoustics, 2011

Real-time software implementation of an IEEE 802.11a baseband receiver on Intel multicore.
Proceedings of the IEEE International Conference on Acoustics, 2011

Systematic construction of real lapped tight frame transforms.
IEEE Trans. Signal Process., 2010

Algebraic signal processing theory: sampling for infinite and finite 1-D space.
IEEE Trans. Signal Process., 2010

Offline library adaptation using automatically generated heuristics.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Hardware implementation of the discrete fourier transform with non-power-of-two problem size.
Proceedings of the IEEE International Conference on Acoustics, 2010

Computer Generation of Efficient Software Viterbi Decoders.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Program Composition and Optimization: An Introduction.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

10191 Executive Summary - Program Composition and Optimization : Autotuning, Scheduling, Metaprogramming and Beyond.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

10191 Abstracts Collection - Program Composition and Optimization : Autotuning, Scheduling, Metaprogramming and Beyond.
Proceedings of the Program Composition and Optimization: Autotuning, Scheduling, Metaprogramming and Beyond, 09.05., 2010

Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for Real DFTs.
IEEE Trans. Signal Process., 2009

Discrete fourier transform on multicore.
IEEE Signal Process. Mag., 2009

Permuting streaming data using RAMs.
J. ACM, 2009

Automatic synthesis of high performance mathematical programs.
Proceedings of the Symbolic and Algebraic Computation, International Symposium, 2009

Computer generation of fast fourier transforms for the cell broadband engine.
Proceedings of the 23rd international conference on Supercomputing, 2009

Bandit-based optimization on graphs with application to library performance tuning.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Generating high performance pruned FFT implementations.
Proceedings of the IEEE International Conference on Acoustics, 2009

Operator Language: A Program Generation Framework for Fast Kernels.
Proceedings of the Domain-Specific Languages, IFIP TC 2 Working Conference, 2009

Automatic generation of streaming datapaths for arbitrary fixed permutations.
Proceedings of the Design, Automation and Test in Europe, 2009

Computer Generation of General Size Linear Transform Libraries.
Proceedings of the CGO 2009, 2009

Automatic Tuning of Discrete Fourier Transforms Driven by Analytical Modeling.
Proceedings of the PACT 2009, 2009

Algebraic Signal Processing Theory: 1-D Space.
IEEE Trans. Signal Process., 2008

Algebraic Signal Processing Theory: Foundation and 1-D Time.
IEEE Trans. Signal Process., 2008

Algebraic Signal Processing Theory: Cooley-Tukey Type Algorithms for DCTs and DSTs.
IEEE Trans. Signal Process., 2008

Algebraic signal processing theory: Cooley-Tukey type algorithms on the 2-D hexagonal spatial lattice.
Appl. Algebra Eng. Commun. Comput., 2008

Axonal bouton modeling, detection and distribution analysis for the study of neural circuit organization and plasticity.
Proceedings of the 2008 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, 2008

Domain-specific library generation for parallel software and hardware platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Haar filter banks for I-D space signals.
Proceedings of the IEEE International Conference on Acoustics, 2008

Alternatives to the discrete fourier transform.
Proceedings of the IEEE International Conference on Acoustics, 2008

Formal datapath representation and manipulation for implementing DSP transforms.
Proceedings of the 45th Design Automation Conference, 2008

Generating SIMD Vectorized Permutations.
Proceedings of the Compiler Construction, 17th International Conference, 2008

System Demonstration of Spiral: Generator for High-Performance Linear Transform Libraries.
Proceedings of the Algebraic Methodology and Software Technology, 2008

Mechanical Derivation of Fused Multiply-Add Algorithms for Linear Transforms.
IEEE Trans. Signal Process., 2007

Algebraic Signal Processing Theory: 2-D Spatial Hexagonal Lattice.
IEEE Trans. Image Process., 2007

Time-Multiplexed Multiple-Constant Multiplication.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Multiplierless multiple constant multiplication.
ACM Trans. Algorithms, 2007

An Adaptive Multiresolution Approach to Fingerprint Recognition.
Proceedings of the International Conference on Image Processing, 2007

SIMD Vectorization of Non-Two-Power Sized FFTs.
Proceedings of the IEEE International Conference on Acoustics, 2007

FFT Compiler: from math to efficient hardware HLDVT invited short paper.
Proceedings of the IEEE International High Level Design Validation and Test Workshop, 2007

Performance/Energy Optimization of DSP Transforms on the XScale Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

How to Write Fast Numerical Code: A Small Introduction.
Proceedings of the Generative and Transformational Techniques in Software Engineering II, 2007

Can we teach computers to write fast libraries?
Proceedings of the Generative Programming and Component Engineering, 2007

Generating FPGA-Accelerated DFT Libraries.
Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, 2007

Algebraic Signal Processing Theory
CoRR, 2006

A Rewriting System for the Vectorization of Signal Transforms.
Proceedings of the High Performance Computing for Computational Science, 2006

Tools and techniques for performance - FFT program generation for shared memory: SMP and multicore.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006

Automatic Performance Optimization of the Discrete Fourier Transform on Distributed Memory Computers.
Proceedings of the Parallel and Distributed Processing and Applications, 2006

Algebraic Derivation of General Radix Cooley-Tukey Algorithms for the Real Discrete Fourier Transform.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

The Algebraic Structure in Signal Processing: Time and Space.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Sampling Theorem Associated With the Discrete Cosine Transform.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Fast and accurate resource estimation of automatically generated custom DFT IP cores.
Proceedings of the ACM/SIGDA 14th International Symposium on Field Programmable Gate Arrays, 2006

Program generation for the all-pairs shortest path problem.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

SPIRAL: Code Generation for DSP Transforms.
Proc. IEEE, 2005

Special Issue on Program Generation, Optimization, and Platform Adaptation.
Proc. IEEE, 2005

Formal loop merging for signal transforms.
Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005

Fourier transform for the spatial quincunx lattice.
Proceedings of the 2005 International Conference on Image Processing, 2005

Fourier transform for the directed quincunx lattice.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Performance analysis of the filtered backprojection image reconstruction algorithms.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Real, Tight Frames with Maximal Robustness to Erasures.
Proceedings of the 2005 Data Compression Conference (DCC 2005), 2005

Automatic generation of customized discrete fourier transform IPs.
Proceedings of the 42nd Design Automation Conference, 2005

Special issue on computer algebra and signal processing: forward by the guest editors.
J. Symb. Comput., 2004

Symmetry-based matrix factorization.
J. Symb. Comput., 2004

Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Alogorithms.
Int. J. High Perform. Comput. Appl., 2004

Automatically Tuned FFTs for BlueGene/L's Double FPU.
Proceedings of the High Performance Computing for Computational Science, 2004

Custom-optimized multiplierless implementations of DSP algorithms.
Proceedings of the 2004 International Conference on Computer-Aided Design, 2004

Automatic cost minimization for multiplierless implementations of discrete signal transforms.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Automatic generation of implementations for DSP transforms on fused multiply-add architectures.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

The discrete triangle transform.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Automatically generated high-performance code for discrete wavelet transforms.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Multiple constant multiplication by time-multiplexed mapping of addition chains.
Proceedings of the 41th Design Automation Conference, 2004

The Algebraic Approach to the Discrete Cosine and Sine Transforms and Their Fast Algorithms.
SIAM J. Comput., 2003

Short Vector Code Generation for the Discrete Fourier Transform.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Cooley-Tukey FFT like algorithms for the DCT.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Fast automatic software implementations of FIR filters.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Short vector code generation and adaptation for DSP algorithms.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Toward efficient static analysis of finite-precision effects in DSP applications via affine arithmetic modeling.
Proceedings of the 40th Design Automation Conference, 2003

Decomposing Monomial Representations of Solvable Groups.
J. Symb. Comput., 2002

A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Automatic generation of fast discrete signal transforms.
IEEE Trans. Signal Process., 2001

Fast Automatic Generation of DSP Algorithms.
Proceedings of the Computational Science - ICCS 2001, 2001

In search of the optimal Walsh-Hadamard transform.
Proceedings of the IEEE International Conference on Acoustics, 2000

Fast Quantum Fourier Transforms for a Class of Non-Abelian Groups.
Proceedings of the Applied Algebra, 1999

Solving Puzzles Related to Permutation Groups.
Proceedings of the 1998 International Symposium on Symbolic and Algebraic Computation, 1998

Konstruktive Darstellungstheorie und Algorithmengenerierung.
PhD thesis, 1998

Decomposing a Permutation into a Conjugated Tensor Product.
Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, 1997
