Gabriel H. Loh

Michael J. Schulte

Mike Ignatowski

Vignesh Adhinarayanan

Kishore Punniyamurthy

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

The Next Era for Chiplet Innovation.

[BibT_eX]

[DOI]

Raja Swaminathan

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2023

2021

Accelerating Variational Quantum Algorithms Using Circuit Concurrency.

[BibT_eX]

[DOI]

CoRR, 2021

A New Era of Tailored Computing.

[BibT_eX]

[DOI]

Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, June 13-19, 2021, 2021

Increasing GPU Translation Reach by Leveraging Under-Utilized On-Chip Resources.

[BibT_eX]

[DOI]

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Pioneering Chiplet Technology and Design for the AMD EPYC™ and Ryzen™ Processor Families : Industrial Product.

[BibT_eX]

[DOI]

Proceedings of the 48th ACM/IEEE Annual International Symposium on Computer Architecture, 2021

Analyzing and Leveraging Decoupled L1 Caches in GPUs.

[BibT_eX]

[DOI]

Mohamed Assem Ibrahim

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

Understanding Chiplets Today to Anticipate Future Integration Opportunities and Limits.

[BibT_eX]

[DOI]

Samuel Naffziger

Kevin Lepak

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

2020

Experiences with ML-Driven Design: A NoC Case Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Analyzing and Leveraging Shared L1 Caches in GPUs.

[BibT_eX]

[DOI]

Mohamed Assem Ibrahim

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Efficient System Architecture in the Era of Monolithic 3D: Dynamic Inter-tier Interconnect and Processing-in-Memory.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Design Automation Conference 2019, 2019

2018

CODA: Enabling Co-location of Computation and Data for Multiple GPU Systems.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

High-Performance and Energy-Effcient Memory Scheduler Design for Heterogeneous Systems.

[BibT_eX]

[DOI]

CoRR, 2018

Holistic Management of the GPGPU Memory Hierarchy to Manage Warp-level Latency Tolerance.

[BibT_eX]

[DOI]

CoRR, 2018

Challenges of High-Capacity DRAM Stacks and Potential Directions.

[BibT_eX]

[DOI]

Amin Farmahini Farahani

Sudhanva Gurumurthi

Muhammad Shoaib Bin Altaf

Michael Ignatowski

Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

Modular Routing Design for Chiplet-Based Systems.

[BibT_eX]

[DOI]

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Generic System Calls for GPUs.

[BibT_eX]

[DOI]

Ján Veselý

Arkaprava Basu

Mark Oskin

Steven K. Reinhardt

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Scheduling Page Table Walks for Irregular GPU Applications.

[BibT_eX]

[DOI]

Arkaprava Basu

Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Machine learning for performance and power modeling of heterogeneous systems.

[BibT_eX]

[DOI]

Joseph L. Greathouse

Proceedings of the International Conference on Computer-Aided Design, 2018

2017

CODA: Enabling Co-location of Computation and Data for Near-Data Processing.

[BibT_eX]

[DOI]

CoRR, 2017

GPU System Calls.

[BibT_eX]

[DOI]

Ján Veselý

Arkaprava Basu

Mark Oskin

Steven K. Reinhardt

CoRR, 2017

Leveraging near data processing for high-performance checkpoint/restart.

[BibT_eX]

[DOI]

Abhinav Agrawal

Thiruvengadam Vijayaraghavan

James Tuck

Proceedings of the International Conference for High Performance Computing, 2017

There and Back Again: Optimizing the Interconnect in Networks of Memory Cubes.

[BibT_eX]

[DOI]

Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Cost-effective design of scalable high-performance systems using active and passive interposers.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE/ACM International Conference on Computer-Aided Design, 2017

Design and Analysis of an APU for Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

MemPod: A Clustered Architecture for Efficient and Scalable Migration in Flat Address Space Multi-level Memories.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Avoiding TLB Shootdowns Through Self-Invalidating TLB Entries.

[BibT_eX]

[DOI]

Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016

A case for hierarchical rings with deflection routing: An energy-efficient on-chip communication substrate.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Exploiting Interposer Technologies to Disintegrate and Reintegrate Multicore Processors.

[BibT_eX]

[DOI]

IEEE Micro, 2016

Enabling Efficient Dynamic Resizing of Large DRAM Caches via A Hardware Consistent Hashing Mechanism.

[BibT_eX]

[DOI]

CoRR, 2016

Achieving both High Energy Efficiency and High Performance in On-Chip Communication using Hierarchical Rings with Deflection Routing.

[BibT_eX]

[DOI]

CoRR, 2016

Building a Low Latency, Highly Associative DRAM Cache with the Buffered Way Predictor.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Observations and opportunities in architecting shared virtual memory for heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Efficient synthetic traffic models for large, complex SoCs.

[BibT_eX]

[DOI]

Jieming Yin

Onur Kayiran

Matthew Poremba

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

μC-States: Fine-grained GPU Datapath Power Management.

[BibT_eX]

[DOI]

Onur Kayiran

Adwait Jog

Ashutosh Pattnaik

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Design and Analysis of 3D-MAPS (3D Massively Parallel Processor with Stacked Memory).

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2015

Achieving Exascale Capabilities through Heterogeneous Computing.

[BibT_eX]

[DOI]

IEEE Micro, 2015

Large pages and lightweight memory management in virtualized environments: can you have it both ways?

[BibT_eX]

[DOI]

Binh Pham

Ján Veselý

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Enabling interposer-based disintegration of multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

HpMC: An Energy-aware Management System of Multi-level Memory Architectures.

[BibT_eX]

[DOI]

Bronis R. de Supinski

Dimitrios S. Nikolopoulos

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Interconnect-Memory Challenges for Multi-chip, Silicon Interposer Systems.

[BibT_eX]

[DOI]

Yasuko Eckert

Proceedings of the 2015 International Symposium on Memory Systems, 2015

Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

A Software-Managed Approach to Die-Stacked DRAM.

[BibT_eX]

[DOI]

Mark Oskin

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Exploiting Inter-Warp Heterogeneity to Improve GPGPU Performance.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014

A Configurable and Strong RAS Solution for Die-Stacked DRAM Caches.

[BibT_eX]

[DOI]

IEEE Micro, 2014

Toward efficient programmer-managed two-level memory hierarchies in exascale computers.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

Managing DRAM Latency Divergence in Irregular GPGPU Applications.

[BibT_eX]

[DOI]

Rajeev Balasubramonian

Proceedings of the International Conference for High Performance Computing, 2014

Design and Evaluation of Hierarchical Rings with Deflection Routing.

[BibT_eX]

[DOI]

Nachiappan Chidambaram Nachiappan

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Managing GPU Concurrency in Heterogeneous Architectures.

[BibT_eX]

[DOI]

Onur Kayiran

Adwait Jog

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache.

[BibT_eX]

[DOI]

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

NoC Architectures for Silicon Interposer Systems: Why Pay for more Wires when you Can Get them (from your interposer) for Free?

[BibT_eX]

[DOI]

Zimo Li

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

Efficient RAS support for die-stacked DRAM.

[BibT_eX]

[DOI]

Hyeran Jeon

Murali Annavaram

Proceedings of the 2014 International Test Conference, 2014

Last-level cache deduplication.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

Increasing TLB reach by exploiting clustering in page translations.

[BibT_eX]

[DOI]

Binh Pham

Yasuko Eckert

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Guest Editorial.

[BibT_eX]

[DOI]

Yuan Xie

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2013

Optimizing GPU energy efficiency with 3D die-stacking graphics memory and reconfigurable memory interface.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

Top Picks from the 2012 Computer Architecture Conferences.

[BibT_eX]

[DOI]

Babak Falsafi

IEEE Micro, 2013

Resilient die-stacked DRAM caches.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

2012

Supporting Very Large DRAM Caches with Compound-Access Scheduling and MissMap.

[BibT_eX]

[DOI]

Mark D. Hill

IEEE Micro, 2012

Exploiting New Interconnect Technologies in On-Chip Communication.

[BibT_eX]

[DOI]

John Kim

Kiyoung Choi

IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Guest Editorial New Interconnect Technologies in On-Chip Communication.

[BibT_eX]

[DOI]

Kiyoung Choi

John Kim

IEEE J. Emerg. Sel. Topics Circuits Syst., 2012

Computer architecture for die stacking.

[BibT_eX]

[DOI]

Proceedings of Technical Program of 2012 VLSI Design, Automation and Test, 2012

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch.

[BibT_eX]

[DOI]

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design.

[BibT_eX]

[DOI]

Moinuddin K. Qureshi

Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

3D-MAPS: 3D Massively parallel processor with stacked memory.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Solid-State Circuits Conference, 2012

Energy-efficient GPU design with reconfigurable in-package graphics memory.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Low Power Electronics and Design, 2012

Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

2011

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches.

[BibT_eX]

[DOI]

Mark D. Hill

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

A register-file approach for row buffer caches in die-stacked DRAMs.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Preventing PCM banks from seizing too much power.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Thread-aware dynamic shared cache compression in multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

2010

3D Stacked Microprocessor: Are We There Yet?

[BibT_eX]

[DOI]

Yuan Xie

IEEE Micro, 2010

Use ECP, not ECC, for hard failures in resistive memories.

[BibT_eX]

[DOI]

Proceedings of the 37th International Symposium on Computer Architecture (ISCA 2010), 2010

Scalable Shared-Cache Management by Containing Thrashing Workloads.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Quantifying and coping with parametric variations in 3D-stacked microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 47th Design Automation Conference, 2010

Design and analysis of 3D-MAPS: A many-core 3D processor with stacked memory.

[BibT_eX]

[DOI]

Proceedings of the IEEE Custom Integrated Circuits Conference, 2010

2009

3D-Integrated SRAM Components for High-Performance Microprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Design and optimization of the store vectors memory dependence predictor.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2009

Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Zesto: A cycle-level simulator for highly detailed microarchitecture exploration.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Criticality-based optimizations for efficient load processing.

[BibT_eX]

[DOI]

Anne Bracy

Hong Wang

Proceedings of the 15th International Conference on High-Performance Computer Architecture (HPCA-15 2009), 2009

Thermal optimization in multi-granularity multi-core floorplanning.

[BibT_eX]

[DOI]

Proceedings of the 14th Asia South Pacific Design Automation Conference, 2009

2008

Modulo Path History for the Reduction of Pipeline Overheads in Path-based Neural Branch Predictors.

[BibT_eX]

[DOI]

Daniel A. Jiménez

Int. J. Parallel Program., 2008

A Segmented Bloom Filter Algorithm for Efficient Predictors.

[BibT_eX]

[DOI]

Maurício Breternitz Jr.

Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

3D-Stacked Memory Architectures for Multi-core Processors.

[BibT_eX]

[DOI]

Proceedings of the 35th International Symposium on Computer Architecture (ISCA 2008), 2008

PEEP: Exploiting predictability of memory dependences in SMT processors.

[BibT_eX]

[DOI]

Milos Prvulovic

Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008

A modular 3d processor for flexible product design and technology migration.

[BibT_eX]

[DOI]

Proceedings of the 5th Conference on Computing Frontiers, 2008

2007

Static strands: Safely exposing dependence chains for increasing embedded power efficiency.

[BibT_eX]

[DOI]

Peter G. Sassone

D. Scott Wills

Chinnakrishnan S. Ballapuram

ACM Trans. Embed. Comput. Syst., 2007

Multiobjective Microarchitectural Floorplanning for 2-D and 3-D ICs.

[BibT_eX]

[DOI]

Michael B. Healy

Mario Vittes

Mongkol Ekpanyapong

Sung Kyu Lim

Hsien-Hsin S. Lee

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2007

Processor Design in 3D Die-Stacking Technologies.

[BibT_eX]

[DOI]

Yuan Xie

Bryan Black

IEEE Micro, 2007

Matrix scheduler reloaded.

[BibT_eX]

[DOI]

Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), 2007

Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Scalability of 3D-Integrated Arithmetic Units in High-Performance Microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 44th Design Automation Conference, 2007

2006

Design space exploration for 3D architectures.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2006

Controlling the Power and Area of Neural Branch Predictors for Practical Implementation in High-Performance Processors.

[BibT_eX]

[DOI]

Daniel A. Jiménez

Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Adaptive Caches: Effective Shaping of Cache Behavior to Workloads.

[BibT_eX]

[DOI]

Ranjith Subramanian

Yannis Smaragdakis

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Fire-and-Forget: Load/Store Scheduling with No Store Queue at All.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Die Stacking (3D) Microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39 2006), 2006

Implementing Register Files for High-Performance Microprocessors in a Die-Stacked (3D) Technology.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2006), 2006

Revisiting the performance impact of branch predictor latencies.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

The impact of 3-dimensional integration on the design of arithmetic units.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Store vectors for scalable memory dependence prediction and scheduling.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

Dynamic instruction schedulers in a 3-dimensional integration technology.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Thermal analysis of a 3D die-stacked high-performance microprocessor.

[BibT_eX]

[DOI]

Chinnakrishnan S. Ballapuram

Proceedings of the 16th ACM Great Lakes Symposium on VLSI 2006, Philadelphia, PA, USA, April 30, 2006

Microarchitectural floorplanning under performance and thermal tradeoff.

[BibT_eX]

[DOI]

Michael B. Healy

Mario Vittes

Mongkol Ekpanyapong

Sung Kyu Lim

Hsien-Hsin S. Lee

Chinnakrishnan S. Ballapuram

Proceedings of the Conference on Design, Automation and Test in Europe, 2006

Entropy-based low power data TLB design.

[BibT_eX]

[DOI]

Hsien-Hsin S. Lee

Proceedings of the 2006 International Conference on Compilers, 2006

2005

Deconstructing the Frankenpredictor for Implementable Branch Predictors.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2005

Static strands: safely collapsing dependence chains for increasing embedded power efficiency.

[BibT_eX]

[DOI]

Peter G. Sassone

D. Scott Wills

Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, 2005

Simulation Differences Between Academia and Industry: A Branch Prediction Case Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Implementing Caches in a 3D Technology for High Performance Processors.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2003

Exploiting Bias in the Hysteresis Bit of 2-bit Saturating Counters in Branch Predictors.

[BibT_eX]

[DOI]

Arvind Krishnamurthy

J. Instr. Level Parallelism, 2003

Width-Partitioned Load Value Predictors.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2003

2002

A Comparison of Asymptotically Scalable Superscalar Processors.

[BibT_eX]

[DOI]

Bradley C. Kuszmaul

Theory Comput. Syst., 2002

Exploiting data-width locality to increase superscalar execution bandwidth.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Speculative Clustered Caches for Clustered Processors.

[BibT_eX]

[DOI]

Rahul Sami

Proceedings of the High Performance Computing, 4th International Symposium, 2002

Applying Machine Learning for Ensemble Branch Predictors.

[BibT_eX]

[DOI]

Proceedings of the Developments in Applied Artificial Intelligence, 2002

Predicting Conditional Branches With Fusion-Based Hybrid Predictors.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

A time-stamping algorithm for efficient performance estimation of superscalar processors.

[BibT_eX]

[DOI]

Proceedings of the Joint International Conference on Measurements and Modeling of Computer Systems, 2001

2000

Circuits for wide-window superscalar processors.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

1999

A Comparison of Scalable Superscalar Processors.

[BibT_eX]

[DOI]

Bradley C. Kuszmaul