Matthew D. Sinclair

Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023

Improving the Scalability of GPU Synchronization Primitives.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2023

Fifty Years of the International Symposium on Computer Architecture: A Data-Driven Retrospective.

[BibT_eX]

[DOI]

Parthasarathy Ranganathan

IEEE Micro, 2023

Fifty Years of ISCA: A data-driven retrospective on key trends.

[BibT_eX]

[DOI]

Gaurang Upasani

Parthasarathy Ranganathan

Adrian Sampson

CoRR, 2023

Integrating Per-Stream Stat Tracking into Accel-Sim.

[BibT_eX]

[DOI]

Shichen Qiao

Xin Su

CoRR, 2023

Computation vs. Communication Scaling for Future Transformers on Future Hardware.

[BibT_eX]

[DOI]

CoRR, 2023

Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2023

2022

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich Systems.

[BibT_eX]

[DOI]

Shivaram Venkataraman

Proceedings of the SC22: International Conference for High Performance Computing, 2022

Demystifying BERT: System Design Implications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Only Buffer When You Need To: Reducing On-chip GPU Traffic with Reconfigurable Local Atomic Buffers.

[BibT_eX]

[DOI]

Preyesh Dalmia

Rohan Mahapatra

Daniel Rodrigues Carvalho

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

Demystifying BERT: Implications for Accelerator Design.

[BibT_eX]

[DOI]

CoRR, 2021

Enabling Reproducible and Agile Full-System Simulation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

DENNI: Distributed Neural Network Inference on Severely Resource Constrained Edge Devices.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Performance, 2021

Deadline-Aware Offloading for High-Throughput Accelerators.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020

Inter-kernel Reuse-aware Thread Block Scheduling.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2020

The gem5 Simulator: Version 20.0+.

[BibT_eX]

[DOI]

Amin Farmahini Farahani

Hamidreza Khaleghzadeh

CoRR, 2020

Deterministic Atomic Buffering.

[BibT_eX]

[DOI]

Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

SeqPoint: Identifying Representative Iterations of Sequence-Based Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2020

Independent Forward Progress of Work-groups.

[BibT_eX]

[DOI]

Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019

Optimizing GPU Cache Policies for MI Workloads.

[BibT_eX]

[DOI]

CoRR, 2019

Analyzing Machine Learning Workloads Using a Detailed GPU Simulator.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2019

Optimizing GPU Cache Policies for MI Workloads.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Workload Characterization, 2019

2018

HPVM: heterogeneous parallel virtual machine.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Spandex: A Flexible Interface for Efficient Heterogeneous Coherence.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

Efficient coherence and consistency for specialized memory hierarchies

[BibT_eX]

[DOI]

PhD thesis, 2017

Chasing Away RAts: Semantics and Evaluation for Relaxed Atomics on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

2016

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

POSTER: hVISC: A Portable Abstraction for Heterogeneous Parallel Systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Efficient GPU synchronization without scopes: saying no to complex consistency models.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Stash: have your scratchpad and cache it too.

[BibT_eX]

[DOI]

Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

2011

Sampling + DMR: practical and low-overhead permanent fault detection.

[BibT_eX]

[DOI]

Shuou Nomura