2022
Locality-Aware CTA Scheduling for Gaming Applications.
ACM Trans. Archit. Code Optim., 2022
GPU Subwarp Interleaving.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
2021
Corrections to "Countering Load-to-Use Stalls in the NVIDIA Turing GPU".
IEEE Micro, 2021
Cooperative Profile Guided Optimizations.
Comput. Graph. Forum, 2021
PGZ: automatic zero-value code specialization.
Proceedings of the CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, 2021
2020
<i>Zeroploit</i>: Exploiting Zero Valued Operands in Interactive Gaming Applications.
ACM Trans. Archit. Code Optim., 2020
Countering Load-to-Use Stalls in the NVIDIA Turing GPU.
IEEE Micro, 2020
AZP: Automatic Specialization for Zero Values in Gaming Applications.
CoRR, 2020
2013
Mesoscale performance simulation of multicore processor systems.
Softw. Syst. Model., 2013
2010
Statistically regulating program behavior via mainstream computing.
Proceedings of the CGO 2010, 2010
2009
Lightweight predication support for out of order processors.
Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009
2008
Performance scalability of decoupled software pipelining.
ACM Trans. Archit. Code Optim., 2008
Spice: speculative parallel iteration chunk execution.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008
2007
Speculative Decoupled Software Pipelining.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007
2006
From sequential programs to concurrent threads.
IEEE Comput. Archit. Lett., 2006
Support for High-Frequency Streaming in CMPs.
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006
2005
Software-controlled fault tolerance.
ACM Trans. Archit. Code Optim., 2005
Hardware-modulated parallelism in chip multiprocessors.
SIGARCH Comput. Archit. News, 2005
Automatic Thread Extraction with Decoupled Software Pipelining.
Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-38 2005), 2005
Design and Evaluation of Hybrid Fault-Detection Systems.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
Computing Architectural Vulnerability Factors for Address-Based Structures.
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
SWIFT: Software Implemented Fault Tolerance.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005
2004
RIFLE: An Architectural Framework for User-Centric Information-Flow Security.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004
Decoupled Software Pipelining with the Synchronization Array.
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT 2004), 29 September, 2004