Timothy G. Rogers

CoRR, 2024

Integrating RISC-V SIMT and Scalar Cores: Loosely to Tightly Coupled.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing. ISC High Performance 2024 International Workshops, 2024

Concurrency-Aware Register Stacks for Efficient GPU Function Calls.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Extending GPU Ray-Tracing Units for Hierarchical Search Acceleration.

[BibT_eX]

[DOI]

Aaron Barnes

Fangjia Shen

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

ThreadFuser: A SIMT Analysis Framework for MIMD Programs.

[BibT_eX]

[DOI]

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

CRISP: Concurrent Rendering and Compute Simulation Platform for GPUs.

[BibT_eX]

[DOI]

Junrui Pan

Proceedings of the IEEE International Symposium on Workload Characterization, 2024

2023

Mitigating GPU Core Partitioning Performance Effects.

[BibT_eX]

[DOI]

Aaron Barnes

Fangjia Shen

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

2022

SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices.

[BibT_eX]

[DOI]

Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

A SIMT Analyzer for Multi-Threaded CPU Applications.

[BibT_eX]

[DOI]

Ahmad Alawneh

Mahmoud Khairy

Proceedings of the International IEEE Symposium on Performance Analysis of Systems and Software, 2022

2021

AccelWattch: A Power Modeling Framework for Modern GPUs.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Characterizing Massively Parallel Polymorphism.

[BibT_eX]

[DOI]

Ahmad Alawneh

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Deadline-Aware Offloading for High-Throughput Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Judging a type by its pointer: optimizing GPU virtual functions.

[BibT_eX]

[DOI]

Ahmad Alawneh

Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021

2020

Locality-Centric Data and Threadblock Management for Massive GPUs.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Deterministic Atomic Buffering.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

Dimensionality-Aware Redundant SIMT Instruction Elimination.

[BibT_eX]

[DOI]

Tsung Tai Yeh

Roland N. Green

Proceedings of the ASPLOS '20: Architectural Support for Programming Languages and Operating Systems, 2020

2019

Pagoda: A GPU Runtime System for Narrow Tasks.

[BibT_eX]

[DOI]

ACM Trans. Parallel Comput., 2019

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

A Detailed Model for Contemporary GPU Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

POSTER: Quantifying the Direct Overhead of Virtual Function Calls on Massively Parallel Architectures.

[BibT_eX]

[DOI]

Roland N. Green

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

General-Purpose Graphics Processor Architectures

[BibT_eX]

[DOI]

Wilson Wai Lun Fung

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01759-9, 2018

A Quantitative Evaluation of Contemporary GPU Simulation Methodology.

[BibT_eX]

[DOI]

Akshay Jain

Mahmoud Khairy

Proc. ACM Meas. Anal. Comput. Syst., 2018

Exploring Modern GPU Memory System Design Challenges through Accurate Modeling.

[BibT_eX]

[DOI]

CoRR, 2018

Characterizing the Runtime Effects of Object-Oriented Workloads on GPUs.

[BibT_eX]

[DOI]

Roland N. Green

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Pagoda: Fine-Grained GPU Resource Virtualization for Narrow Tasks.

[BibT_eX]

[DOI]

Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

2016

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

A variable warp size architecture.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2014

Learning your limit: managing massively multithreaded caches through scheduling.

[BibT_eX]

[DOI]

Commun. ACM, 2014

2013

Cache-Conscious Thread Scheduling for Massively Multithreaded Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2013

Divergence-aware warp scheduling.

[BibT_eX]

[DOI]

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

2012

Cache-Conscious Wavefront Scheduling.

[BibT_eX]

[DOI]