Profile Guided Optimization without Profiles: A Machine Learning Approach.
CoRR, 2021
Warrior1: A Performance Sanitizer for C++.
CoRR, 2020
Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2018
Glow: Graph Lowering Compiler Techniques for Neural Networks.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2018
Block Unification IF-conversion for High Performance Architectures.
IEEE Comput. Archit. Lett., 2014
Optimizing Wait States in the Synthesis of Memory References with Unpredictable Latencies.
ACM Trans. Reconfigurable Technol. Syst., 2013
The benefits of using variable-length pipelined operations in high-level synthesis.
ACM Trans. Embed. Comput. Syst., 2013
Using memory profile analysis for automatic synthesis of pointers code.
ACM Trans. Embed. Comput. Syst., 2013
Hybrid type legalization for a sparse SIMD instruction set.
ACM Trans. Archit. Code Optim., 2013
Combining static and dynamic array detection for binary synthesis with multiple memory ports.
Des. Autom. Embed. Syst., 2011
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs.
ACM Trans. Reconfigurable Technol. Syst., 2010
Finding the best compromise in compiling compound loops to Verilog.
J. Syst. Archit., 2010
Automatic memory partitioning: increasing memory parallelism via data structure partitioning.
Proceedings of the 8th International Conference on Hardware/Software Codesign and System Synthesis, 2010
The effect of unrolling and inlining for Python bytecode optimizations.
Proceedings of of SYSTOR 2009: The Israeli Experimental Systems Conference 2009, 2009
Binary Synthesis with multiple memory banks targeting array references.
Proceedings of the 19th International Conference on Field Programmable Logic and Applications, 2009
Synthesis for variable pipelined function units.
Proceedings of the 2008 IEEE International Symposium on System-on-Chip, 2008