Onur Kayiran

Shaizeen Aga

Proceedings of the MEMSYS 2021: The International Symposium on Memory Systems, Washington, USA, September 27, 2021

Analyzing and Leveraging Decoupled L1 Caches in GPUs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

Analyzing and Leveraging Shared L1 Caches in GPUs.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Optimizing GPU Cache Policies for MI Workloads.

[BibT_eX]

[DOI]

CoRR, 2019

Opportunistic computing in GPU architectures.

[BibT_eX]

[DOI]

Anand Sivasubramaniam

Chita R. Das

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Optimizing GPU Cache Policies for MI Workloads.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in GPUs.

[BibT_eX]

[DOI]

Hongyuan Liu

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

Quantifying Data Locality in Dynamic Parallelism in GPUs.

[BibT_eX]

[DOI]

Mahmut Taylan Kandemir

Chita R. Das

Proc. ACM Meas. Anal. Comput. Syst., 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.

[BibT_eX]

[DOI]

CoRR, 2018

Architectural Support for Efficient Large-Scale Automata Processing.

[BibT_eX]

[DOI]

Hongyuan Liu

Sreepathi Pai

Muhammad Shoaib Bin Altaf

Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Modular Routing Design for Chiplet-Based Systems.

[BibT_eX]

[DOI]

Natalie D. Enright Jerger

Gabriel H. Loh

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management.

[BibT_eX]

[DOI]

Haonan Wang

Fan Luo

Thiruvengadam Vijayaraghavan

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

CODA: Enabling Co-location of Computation and Data for Near-Data Processing.

[BibT_eX]

[DOI]

CoRR, 2017

There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Design and Analysis of an APU for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Controlled Kernel Launch for Dynamic Parallelism in GPUs.

[BibT_eX]

[DOI]

Mahmut T. Kandemir

Chita R. Das

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

2016

Exploiting Core Criticality for Enhanced GPU Performance.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, 2016

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Prefetching Techniques for Near-memory Throughput Processors.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Supercomputing, 2016

Efficient synthetic traffic models for large, complex SoCs.

[BibT_eX]

[DOI]

Jieming Yin

Natalie D. Enright Jerger

Matthew Poremba

Gabriel H. Loh

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

μC-States: Fine-grained GPU Datapath Power Management.

[BibT_eX]

[DOI]

Ashutosh Pattnaik

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Anatomy of GPU Memory System for Multi-Application Execution.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

Managing GPU Concurrency in Heterogeneous Architectures.

[BibT_eX]

[DOI]

Nachiappan Chidambaram Nachiappan

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

2013

Orchestrated scheduling and prefetching for GPGPUs.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance.

[BibT_eX]

[DOI]