Static Instruction Scheduling for High Performance on Limited Hardware.
IEEE Trans. Computers, 2018
SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018
Transcending Hardware Limits with Software Out-of-Order Processing.
IEEE Comput. Archit. Lett., 2017
Clairvoyance: look-ahead compile-time scheduling.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017
Efficient Execution Paradigms for Parallel Heterogeneous Architectures.
PhD thesis, 2016
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead.
ACM Trans. Archit. Code Optim., 2016
Profiling-Assisted Decoupled Access-Execute.
CoRR, 2016
Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs.
Proceedings of the 25th International Conference on Compiler Construction, 2016
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014
Towards more efficient execution: a decoupled access-execute approach.
Proceedings of the International Conference on Supercomputing, 2013
<i>Tagged Procedure Calls</i> (<i>TPC</i>): Efficient Runtime Support for Task-Based Parallelism on the Cell Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010