Michael Pellauer

Orcid: 0000-0002-5305-4307

According to our database1, Michael Pellauer authored at least 55 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design.
CoRR, 2024

Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models.
CoRR, 2024

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators (Abstract).
Proceedings of the 2024 ACM Workshop on Highlights of Parallel Computing, 2024

2023
Symphony: Orchestrating Sparse and Dense Tensors with Hierarchical Heterogeneous Processing.
ACM Trans. Comput. Syst., 2023

Exploiting Inter-Operation Data Reuse in Scientific Applications using GOGETA.
CoRR, 2023

TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling (Extended Abstract).
Proceedings of the 2023 ACM Workshop on Highlights of Parallel Computing, 2023

Optimizing Compression Schemes for Parallel Sparse Tensor Algebra.
Proceedings of the Data Compression Conference, 2023

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Flexagon: A Multi-dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators.
ACM Trans. Archit. Code Optim., 2022

A Formalism of DNN Accelerator Flexibility.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Enabling Flexibility for Sparse Tensor Acceleration via Heterogeneity.
CoRR, 2022

DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators.
Proceedings of the 2022 Design, Automation & Test in Europe Conference & Exhibition, 2022

Self adaptive reconfigurable arrays (SARA): learning flexible GEMM accelerator configuration and mapping-space using ML.
Proceedings of the DAC '22: 59th ACM/IEEE Design Automation Conference, San Francisco, California, USA, July 10, 2022

2021
Self-Adaptive Reconfigurable Arrays (SARA): Using ML to Assist Scaling GEMM Acceleration.
CoRR, 2021

Flexion: A Quantitative Metric for Flexibility in DNN Accelerators.
IEEE Comput. Archit. Lett., 2021

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Data Orchestration in Deep Learning Accelerators
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01767-4, 2020

MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings.
IEEE Micro, 2020

2019
Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

ExTensor: An Accelerator for Sparse Tensor Algebra.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

2018
Full-Stack Memory Model Verification with TriCheck.
IEEE Micro, 2018

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators.
CoRR, 2018

UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018


2017
RTLcheck: verifying the memory consistency of RTL designs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
Exploring the Trisection of Software, Hardware, and ISA in Memory Model Design.
CoRR, 2016

Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings.
CoRR, 2016

2015
Efficient Control and Communication Paradigms for Coarse-Grained Spatial Architectures.
ACM Trans. Comput. Syst., 2015

Verifying Correct Microarchitectural Enforcement of Memory Consistency Models.
IEEE Micro, 2015

CCICheck: using µhb graphs to verify the coherence-consistency interface.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

ArMOR: defending against memory consistency model mismatches in heterogeneous architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2014
Efficient Spatial Processing Element Control via Triggered Instructions.
IEEE Micro, 2014

Pipe Check: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013
Triggered instructions: a control paradigm for spatially-programmed architectures.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

A Hierarchical Architectural Framework for Reconfigurable Logic Computing.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Heracles: a tool for fast RTL-based design space exploration of multicore processors.
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013

2012
Leveraging latency-insensitivity to ease multiple FPGA design.
Proceedings of the ACM/SIGDA 20th International Symposium on Field Programmable Gate Arrays, 2012

2011
Cycle-accurate multicore performance models on FPGAs.
PhD thesis, 2011

HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Heracles: Fully Synthesizable Parameterized MIPS-Based Multicore System.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2011

Leap scratchpads: automatic memory and cache management for reconfigurable logic.
Proceedings of the ACM/SIGDA 19th International Symposium on Field Programmable Gate Arrays, 2011

2010
Design contest overview: Combined architecture for network stream categorization and intrusion detection (CANSCID).
Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010), 2010

A design flow based on modular refinement.
Proceedings of the 8th ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010), 2010

2009
A-Port Networks: Preserving the Timed Behavior of Synchronous Systems for Modeling on FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2009

Soft connections: addressing the hardware-design modularity problem.
Proceedings of the 46th Design Automation Conference, 2009

2008
Quick Performance Models Quickly: Closely-Coupled Partitioned Simulation on FPGAs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

A-Ports: an efficient abstraction for cycle-accurate performance models on FPGAs.
Proceedings of the ACM/SIGDA 16th International Symposium on Field Programmable Gate Arrays, 2008

2007
Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA.
Proceedings of the 5th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2007), May 30, 2007

Scheduling as Rule Composition.
Proceedings of the 5th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2007), May 30, 2007

2006
802.11a transmitter: a case study in microarchitectural exploration.
Proceedings of the 4th ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2006), 2006

2005
Synthesis of synchronous assertions with guarded atomic actions.
Proceedings of the 3rd ACM & IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE 2005), 2005


  Loading...