Scott A. Mahlke
Orcid: 0000-0002-0438-0616Affiliations:
- University of Michigan, Ann Arbor, MI, USA
According to our database1,
Scott A. Mahlke
authored at least 236 papers
between 1991 and 2024.
Collaborative distances:
Collaborative distances:
Awards
ACM Fellow
ACM Fellow 2020, "For contributions in compiler code generation for instruction level parallelism, and customized microprocessor architectures".
IEEE Fellow
IEEE Fellow 2015, "For contributions to compiler code generation and automatic processor customization".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme.
CoRR, 2024
SlimSLAM: An Adaptive Runtime for Visual-Inertial Simultaneous Localization and Mapping.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024
2023
BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks.
ACM Trans. Embed. Comput. Syst., October, 2023
Proceedings of the IEEE International Symposium on Workload Characterization, 2023
2022
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022
AVMaestro: A Centralized Policy Enforcement Framework for Safe Autonomous-driving Environments.
Proceedings of the 2022 IEEE Intelligent Vehicles Symposium, 2022
SoftFusion: A Low-Cost Approach to Enhance Reliability of Object Detection Applications.
Proceedings of the IEEE 40th International Conference on Computer Design, 2022
SRTuner: Effective Compiler Optimization Customization by Exposing Synergistic Relations.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022
2021
A Systematic Framework to Identify Violations of Scenario-dependent Driving Rules in Autonomous Vehicle Software.
Proc. ACM Meas. Anal. Comput. Syst., 2021
Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021
2020
Proceedings of the 21st ACM SIGPLAN/SIGBED International Conference on Languages, 2020
Proceedings of the 2020 ACM Workshop on Forming an Ecosystem Around Software Transformation, 2020
AVGuardian: Detecting and Mitigating Publish-Subscribe Overprivilege for Autonomous Vehicle Systems.
Proceedings of the IEEE European Symposium on Security and Privacy, 2020
Proceedings of the 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2020
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020
2019
ACM Trans. Embed. Comput. Syst., 2019
Multi-objective Exploration for Practical Optimization Decisions in Binary Translation.
ACM Trans. Embed. Comput. Syst., 2019
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019
Proceedings of the 46th International Symposium on Computer Architecture, 2019
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019
2018
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2018
Proceedings of the 32nd International Conference on Supercomputing, 2018
Proceedings of the 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2018
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018
2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
DeftNN: addressing bottlenecks for DNN execution on GPUs via synapse vector elimination and near-compute data fission.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
2016
IEEE Trans. Computers, 2016
IEEE Des. Test, 2016
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016
Concise loads and stores: The case for an asymmetric compute-memory architecture for approximation.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016
Proceedings of the 35th International Conference on Computer-Aided Design, 2016
2015
J. Signal Process. Syst., 2015
ACM Trans. Comput. Syst., 2015
GetMobile Mob. Comput. Commun., 2015
ELF: maximizing memory-level parallelism for GPUs with coordinated warp and fetch scheduling.
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 13th Annual International Conference on Mobile Systems, 2015
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
ACM Trans. Comput. Syst., 2014
ACM Trans. Archit. Code Optim., 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014
Proceedings of the Computing Frontiers Conference, CF'14, 2014
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014
2013
Eliminating Concurrency Bugs in Multithreaded Software: A New Approach Based on Discrete-Event Control.
IEEE Trans. Control. Syst. Technol., 2013
Optimal Liveness-Enforcing Control for a Class of Petri Nets Arising in Multithreaded Software.
IEEE Trans. Autom. Control., 2013
Discret. Event Dyn. Syst., 2013
Proceedings of the IEEE Workshop on Signal Processing Systems, 2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2013
Parallelization techniques for implementing trellis algorithms on graphics processors.
Proceedings of the 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013), 2013
Proceedings of the IEEE International Symposium on Workload Characterization, 2013
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013
Efficient execution of augmented reality applications on mobile programmable accelerators.
Proceedings of the 2013 International Conference on Field-Programmable Technology, 2013
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
2012
IEEE Trans. Computers, 2012
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012
Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, 2012
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Dynamic acceleration of multithreaded program critical paths in near-threshold systems.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012
Efficient soft error protection for commodity embedded microprocessors using profile information.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012
Proceedings of the 2012 International Conference on Field-Programmable Technology, 2012
Proceedings of the 49th Annual Design Automation Conference 2012, 2012
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012
Proceedings of the 15th International Conference on Compilers, 2012
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012
Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012
2011
J. Signal Process. Syst., 2011
IEEE Trans. Computers, 2011
IEEE Trans. Computers, 2011
Bundled execution of recurring traces for energy-efficient general purpose processing.
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011
Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011
Archipelago: A polymorphic cache design for enabling robust near-threshold operation.
Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011
Proceedings of the CGO 2011, 2011
Deadlock-avoidance control of multithreaded software: An efficient siphon-based algorithm for Gadara petri nets.
Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, 2011
Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
Supervisory control of software execution for failure avoidance: Experience from the Gadara project.
Proceedings of the 10th International Workshop on Discrete Event Systems, 2010
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010
Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010
Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010
Proceedings of the High Performance Embedded Architectures and Compilers, 2010
StageWeb: Interweaving pipeline stages into a wearout and variation tolerant CMP fabric.
Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks, 2010
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010
Synthesis of maximally-permissive liveness-enforcing control policies for Gadara petri nets.
Proceedings of the 49th IEEE Conference on Decision and Control, 2010
Proceedings of the 2010 International Conference on Compilers, 2010
Proceedings of the 2010 International Conference on Compilers, 2010
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010
2009
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009
Proceedings of the IEEE 7th Symposium on Application Specific Processors, 2009
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009
Proceedings of the 36th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2009
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory.
Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, 2009
Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications.
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures.
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, 2009
Proceedings of the 2009 International Symposium on Low Power Electronics and Design, 2009
Proceedings of the 27th International Conference on Computer Design, 2009
Bridging the computation gap between programmable processors and hardwired accelerators.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
Proceedings of the CGO 2009, 2009
Gadara nets: Modeling and analyzing lock allocation for deadlock avoidance in multithreaded software.
Proceedings of the 48th IEEE Conference on Decision and Control, 2009
Proceedings of the 2009 International Conference on Compilers, 2009
Maximally permissive deadlock avoidance for multithreaded computer programs (Extended abstract).
Proceedings of the IEEE Conference on Automation Science and Engineering, 2009
Flextream: Adaptive Compilation of Streaming Applications for Heterogeneous Architectures.
Proceedings of the PACT 2009, 2009
2008
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008
Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008
Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-41 2008), 2008
Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008
Proceedings of the IEEE International Conference on Acoustics, 2008
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
Proceedings of the 45th Design Automation Conference, 2008
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008
Proceedings of the 2008 International Conference on Compilers, 2008
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems.
Proceedings of the 2008 International Conference on Compilers, 2008
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008
2007
ACM Trans. Archit. Code Optim., 2007
IEEE Micro, 2007
Proceedings of the Embedded Computer Systems: Architectures, 2007
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007
Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications.
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007
Proceedings of the 2007 International Conference on Compilers, 2007
2006
Proceedings of the IEEE Workshop on Signal Processing Systems, 2006
Proceedings of the 33rd International Symposium on Computer Architecture (ISCA 2006), 2006
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006
Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, 2006
Proceedings of the Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), 2006
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures.
Proceedings of the 2006 International Conference on Compilers, 2006
Proceedings of the 2006 International Conference on Compilers, 2006
Proceedings of the 2006 International Conference on Compilers, 2006
2005
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor.
IEEE Trans. Computers, 2005
IEEE Trans. Computers, 2005
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
Proceedings of the High Performance Embedded Architectures and Compilers, 2005
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005
Proceedings of the 2005 International Conference on Compilers, 2005
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005
2004
Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors.
IEEE Micro, 2004
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004
Proceedings of the Languages and Compilers for High Performance Computing, 2004
Proceedings of the 2nd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2004
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004
Automatic Synthesis of Customized Local Memories for Multicluster Application Accelerators.
Proceedings of the 15th IEEE International Conference on Application-Specific Systems, 2004
2003
Automatic Design of Application Specific Instruction Set Extensions Through Dataflow Graph Exploration.
Int. J. Parallel Program., 2003
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003
Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003
Increasing the number of effective registers in a low-power processor using a windowed register file.
Proceedings of the International Conference on Compilers, 2003
Proceedings of the International Conference on Compilers, 2003
Proceedings of the 14th IEEE International Conference on Application-Specific Systems, 2003
2002
J. VLSI Signal Process., 2002
2001
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001
2000
Code size minimization and retargetable assembly for custom EPIC and VLIW instruction formats.
ACM Trans. Design Autom. Electr. Syst., 2000
Proceedings of the 12th IEEE International Conference on Application-Specific Systems, 2000
1999
The Partial Reverse If-Conversion Framework for Balancing Control Flow and Predication.
Int. J. Parallel Program., 1999
Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1999
Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999
Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999
1998
Proceedings of the 25 Years of the International Symposia on Computer Architecture (Selected Papers)., 1998
Proceedings of the 25th Annual International Symposium on Computer Architecture, 1998
1997
PhD thesis, 1997
Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997
1996
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996
1995
IEEE Trans. Computers, 1995
The Importance of Prepass Code Scheduling for Superscalar and Superpipelined Processors.
IEEE Trans. Computers, 1995
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
A study of the effects of compiler-controlled speculation on instruction and data caches.
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995
1994
Proceedings of the 27th Annual International Symposium on Microarchitecture, San Jose, California, USA, November 30, 1994
Proceedings of the ASPLOS-VI Proceedings, 1994
1993
ACM Trans. Comput. Syst., 1993
J. Supercomput., 1993
Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), 1993
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993
Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993
Register Connection: A New Approach to Adding Registers into Instruction Set Architectures.
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
1992
Proceedings of the Proceedings Supercomputing '92, 1992
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992
Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992
Proceedings of the Languages and Compilers for Parallel Computing, 1992
Proceedings of the 6th international conference on Supercomputing, 1992
Tolerating First Level Memory Access Latency in High-Performance Systems.
Proceedings of the 1992 International Conference on Parallel Processing, 1992
Proceedings of the ASPLOS-V Proceedings, 1992
1991
Softw. Pract. Exp., 1991
Data Access Microarchitectures for Superscalar Processors with Compiler-Assisted Data Prefetching.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
Comparing Static and Dynamic Code Scheduling for Multiple-Instruction-Issue Processors.
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
The Effect of Compiler Optimizations on Available Parallelism in Scalar Programs.
Proceedings of the International Conference on Parallel Processing, 1991