Per Stenström

Proceedings of the 38th ACM International Conference on Supercomputing, 2024

DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators.

[BibT_eX]

[DOI]

Piyumal Ranawaka

Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

2023

Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., September, 2023

SCALE: Secure and Scalable Cache Partitioning.

[BibT_eX]

[DOI]

Nadja Ramhöj Holtryd

Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023

SoK: Analysis of Root Causes and Defense Strategies for Attacks on Microarchitectural Optimizations.

[BibT_eX]

[DOI]

Nadja Ramhöj Holtryd

Proceedings of the 8th IEEE European Symposium on Security and Privacy, 2023

eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022

Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2022

Bounding the execution time of parallel applications on unrelated multiprocessors.

[BibT_eX]

[DOI]

Real Time Syst., 2022

GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

Federated Scheduling of Sporadic DAGs on Unrelated Multiprocessors.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2021

CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021

2020

Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2020

Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

A GPU Register File using Static Data Compression.

[BibT_eX]

[DOI]

Alexandra Angerd

Erik Sintorn

Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019

Trends on heterogeneous and innovative hardware and software systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

QoS-Driven Coordinated Management of Resources to Save Energy in Multi-core Systems.

[BibT_eX]

[DOI]

Mehrzad Nejat

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems.

[BibT_eX]

[DOI]

Proceedings of the 48th International Conference on Parallel Processing, 2019

2018

Scheduling Parallel Real-Time Recurrent Tasks on Multicore Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2018

Global Dead-Block Management for Task-Parallel Programs.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2018

ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018

2017

SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2017

A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs.

[BibT_eX]

[DOI]

Alexandra Angerd

Erik Sintorn

ACM Trans. Archit. Code Optim., 2017

Runtime-Assisted Global Cache Management for Task-Based Parallel Programs.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Timing-Anomaly Free Dynamic Scheduling of Task-Based Parallel Applications.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Real-Time and Embedded Technology and Applications Symposium, 2017

Rock: a framework for pruning the design space of hybrid main memory systems.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Memory Systems, 2017

2016

2015 Maurice Wilkes Award Given to Christos Kozyrakis.

[BibT_eX]

[DOI]

IEEE Micro, 2016

PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2016

Adaptive Row Addressing for Cost-Efficient Parallel Memory Protocols in Large-Capacity Memories.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on Memory Systems, 2016

RADAR: Runtime-assisted dead region management for last-level caches.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016

EUROSERVER: Share-anything scale-out micro-server design.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

2015

A Primer on Compression in the Memory Hierarchy

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01751-3, 2015

HyComp: a hybrid cache compression method for selection of data-type-specific compression methods.

[BibT_eX]

[DOI]

Proceedings of the 48th International Symposium on Microarchitecture, 2015

Performance Impact of Batching Web-Application Requests Using Hot-Spot Processing on GPUs.

[BibT_eX]

[DOI]

Tobias Fjalling

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Enhancing Garbage Collection Synchronization Using Explicit Bit Barriers.

[BibT_eX]

[DOI]

J. Rubén Titos Gil

Proceedings of the 44th International Conference on Parallel Processing, 2015

2014

ZEBRA: Data-Centric Contention Management in Hardware Transactional Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2014

Characterizing and Exploiting Small-Value Memory Instructions.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2014

Introduction to the JPDC special issue on Perspectives on Parallel and Distributed Processing.

[BibT_eX]

[DOI]

Viktor K. Prasanna

Yves Robert

J. Parallel Distributed Comput., 2014

Removal of Conflicts in Hardware Transactional Memory Systems.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2014

A Case for a Value-Aware Cache.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2014

Overhead-aware temporal partitioning on multicore processors.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE Real-Time and Embedded Technology and Applications Symposium, 2014

SC<sup>2</sup>: A statistical compression cache scheme.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Crystal: A Design-Time Resource Partitioning Method for Hybrid Main Memory.

[BibT_eX]

[DOI]

Georgi Gaydadjiev

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Effective resource management towards efficient computing.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Eager Beats Lazy: Improving Store Management in Eager Hardware Transactional Memory.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2013

Moving from petaflops to petadata.

[BibT_eX]

[DOI]

Michael J. Flynn

Oskar Mencer

Veljko M. Milutinovic

Commun. ACM, 2013

Efficient Forwarding of Producer-Consumer Data in Task-Based Programs.

[BibT_eX]

[DOI]

Anurag Negi

Proceedings of the 42nd International Conference on Parallel Processing, 2013

HARP: Adaptive abort recurrence prediction for Hardware Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Improving data access efficiency by using a tagless access buffer (TAB).

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Keynote talk: Towards automatic resource management in parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

Introduction to the special issue on high-performance and embedded architectures and compilers.

[BibT_eX]

[DOI]

Koen De Bosschere

ACM Trans. Archit. Code Optim., 2012

Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications.

[BibT_eX]

[DOI]

Guancheng Chen

Mridha-Mohammad Waliullah

Proceedings of the SC Conference on High Performance Computing Networking, 2012

π-TM: Pessimistic invalidation for scalable lazy hardware transactional memory.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012

Transactional prefetching: narrowing the window of contention in hardware transactional memory.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

Classification and Elimination of Conflicts in Hardware Transactional Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Panel Statement.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

The Impact of Non-coherent Buffers on Lazy Hardware Transactional Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Poster: implications of merging phases on scalability of multi-core architectures.

[BibT_eX]

[DOI]

Ben H. H. Juurlink

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

ZEBRA: a data-centric, hybrid-policy hardware transactional memory design.

[BibT_eX]

[DOI]

Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Eager Meets Lazy: The Impact of Write-Buffering on Hardware Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Implications of Merging Phases on Scalability of Multi-core Architectures.

[BibT_eX]

[DOI]

Ben H. H. Juurlink

Proceedings of the International Conference on Parallel Processing, 2011

A unified approach to eliminate memory accesses early.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Compilers, 2011

Pi-TM: Pessimistic Invalidation for Scalable Lazy Hardware Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010

The Velox Transactional Memory Stack.

[BibT_eX]

[DOI]

IEEE Micro, 2010

LV<sup>*</sup>: A low complexity lazy versioning HTM infrastructure.

[BibT_eX]

[DOI]

Anurag Negi

Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

LV*: a class of lazy versioning HTMs for low-cost integration of transactional memory systems.

[BibT_eX]

[DOI]

Anurag Negi

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010

Characterization and exploitation of narrow-width loads: the narrow-width cache approach.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on Compilers, 2010

2009

FlexCore: Utilizing Exposed Datapath Control for Efficient Computing.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2009

Introduction.

[BibT_eX]

[DOI]

David B. Whalley

Trans. High Perform. Embed. Archit. Compil., 2009

Schemes for avoiding starvation in transactional memory systems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Cancellation of loads that return zero using zero-value caches.

[BibT_eX]

[DOI]

Sally A. McKee

Proceedings of the 23rd international conference on Supercomputing, 2009

A Flexible Code Compression Scheme Using Partitioned Look-Up Tables.

[BibT_eX]

[DOI]

Magnus Själander

Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Zero-Value Caches: Cancelling Loads that Return Zero.

[BibT_eX]

[DOI]

Proceedings of the PACT 2009, 2009

Using Hoarding to Increase Availability in Shared File Systems.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science, 2009

2008

The worst-case execution-time problem - overview of methods and survey of tools.

[BibT_eX]

[DOI]

ACM Trans. Embed. Comput. Syst., 2008

Memory-Link Compression Schemes: A Value Locality Perspective.

[BibT_eX]

[DOI]

Lawrence Spracklen

IEEE Trans. Computers, 2008

Early detection and bypassing of trivial operations to improve energy efficiency of processors.

[BibT_eX]

[DOI]

Magnus Själander

Microprocess. Microsystems, 2008

Simple Penalty-Sensitive Cache Replacement Policies.

[BibT_eX]

[DOI]

Jaeheon Jeong

J. Instr. Level Parallelism, 2008

Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2008

Efficient management of speculative data in hardware transactional memory systems.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008

Zero loads: canceling load requests by tracking zero values.

[BibT_eX]

[DOI]

Proceedings of the 9th workshop on MEmory performance, 2008

Intermediate checkpointing with conflicting access prediction in transactional memory systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Accommodation of the Bandwidth of Large Cache Blocks Using Cache/Memory Link Compression.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

Leveraging Data Promotion for Low Power D-NUCA Caches.

[BibT_eX]

[DOI]

Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

2007

Introduction to Part 1.

[BibT_eX]

[DOI]

Dionisios N. Pnevmatikatos

Trans. High Perform. Embed. Archit. Compil., 2007

High-Performance Embedded Architecture and Compilation Roadmap.

[BibT_eX]

[DOI]

Michael F. P. O'Boyle

Trans. High Perform. Embed. Archit. Compil., 2007

Starvation-free commit arbitration policies for transactional memory systems.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2007

An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches.

[BibT_eX]

[DOI]

Lasse Natvig

SIGARCH Comput. Archit. News, 2007

Improving power efficiency of D-NUCA caches.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2007

SimWattch: Integrating Complete-System and User-Level Performance and Power Simulators.

[BibT_eX]

[DOI]

Jianwei Chen

IEEE Micro, 2007

Effectiveness of caching in a distributed digital library system.

[BibT_eX]

[DOI]

Anders Ardö

J. Syst. Archit., 2007

Energy and Performance Trade-offs between Instruction Reuse and Trivial Computations for Embedded Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE Second International Symposium on Industrial Embedded Systems, 2007

Characterization of Apache web server with Specweb2005.

[BibT_eX]

[DOI]

Proceedings of the 2007 workshop on MEmory performance, 2007

IPDPS Panel: Is the Multi-Core Roadmap going to Live Up to its Promises?

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Loop-level Speculative Parallelism in Embedded Applications.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Starvation-Free Transactional Memory-System Protocols.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2007, 2007

Microprocessors in the era of terascale integration.

[BibT_eX]

[DOI]

Shekhar Borkar

Norman P. Jouppi

Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007

Implicit Transactional Memory in Kilo-Instruction Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 2007

2006

Introduction.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2006

Dual-Thread Speculation: Two Threads in the Machine are Worth Eight in the Bush.

[BibT_eX]

[DOI]

Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Scalable Value-Cache Based Compression Schemes for Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations.

[BibT_eX]

[DOI]

Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006

Chip-multiprocessing and beyond.

[BibT_eX]

[DOI]

Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006

A Cache-Partitioning Aware Replacement Policy for Chip Multiprocessors.

[BibT_eX]

[DOI]

Lasse Natvig

Proceedings of the High Performance Computing, 2006

Simple penalty-sensitive replacement policies for caches.

[BibT_eX]

[DOI]

Jaeheon Jeong

Proceedings of the Third Conference on Computing Frontiers, 2006

Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss Determination.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006

2005

Introduction to the special issue.

[BibT_eX]

[DOI]

Frank Mueller

ACM Trans. Embed. Comput. Syst., 2005

Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

A Robust Main-Memory Compression Scheme.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

A Cost-Effective Main Memory Organization for Future Servers.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Implementing Kilo-Instruction Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Pervasive Services 2005, 2005

The Chip-Multiprocessing Paradigm Shift: Opportunities and Challenges.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Reducing misspeculation overhead for module-level speculative execution.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Computing Frontiers, 2005

Evaluation of extended dictionary-based static code compression schemes.

[BibT_eX]

[DOI]

Proceedings of the Second Conference on Computing Frontiers, 2005

2004

A cache block reuse prediction scheme.

[BibT_eX]

[DOI]

Jonas Jalminger

Microprocess. Microsystems, 2004

A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors.

[BibT_eX]

[DOI]

J. Syst. Archit., 2004

A case for multi-level main memory.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Self-correcting LRU replacement policies.

[BibT_eX]

[DOI]

Martin Kämpe

Proceedings of the First Conference on Computing Frontiers, 2004

2003

Integrating complete-system and user-level performance/power simulators: the SimWattch approach.

[BibT_eX]

[DOI]

Jianwei Chen

Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Improving Speculative Thread-Level Parallelism Through Module Run-Length Prediction.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Speculative Lock Reordering: Optimistic Out-of-Order Execution of Critical Sections.

[BibT_eX]

[DOI]

Peter Rundberg

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure.

[BibT_eX]

[DOI]

Jim Nilsson

Anders Landin

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

A Novel Approach to Cache Block Reuse Predictions.

[BibT_eX]

[DOI]

Jonas Jalminger

Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

Performance and Power Impact of Issue-width in Chip-Multiprocessor Cores.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003

One Chip, One Server: How Do We Exploit Its Power?

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

An Evaluation of Document Prefetching in a Distributed Digital Library.

[BibT_eX]

[DOI]

Anders Ardö

Proceedings of the Research and Advanced Technology for Digital Libraries, 2003

2002

Improvement of energy-efficiency in off-chip caches by selective prefetching.

[BibT_eX]

[DOI]

Jonas Jalminger

Microprocess. Microsystems, 2002

TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Empirical Observations Regarding Predictability in User Access-Behavior in a Distributed Digital Library System.

[BibT_eX]

[DOI]

Anders Ardö

Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

The FAB Predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches.

[BibT_eX]

[DOI]

Martin Kämpe

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

2001

An All-Software Thread-Level Data Dependence Speculation System for Multiprocessors.

[BibT_eX]

[DOI]

Peter Rundberg

J. Instr. Level Parallelism, 2001

A Case Study of Load Distribution in Parallel View Frustum Culling and Collision Detection.

[BibT_eX]

[DOI]

Ulf Assarsson

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

Limits on Speculative Module-Level Parallelism in Imperative and Object-Oriented Programs on CMP Platforms.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Comparative Evaluation of Latency-Tolerating and -Reducing Techniques for Hardware-Only and Software-Only Directory Protocols.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2000

Shared-memory multiprocessing: Current state and future directions.

[BibT_eX]

[DOI]

Adv. Comput., 2000

An analytical model of the working-set sizes in decision-support systems.

[BibT_eX]

[DOI]

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2000

Recency-based TLB preloading.

[BibT_eX]

[DOI]

Ashley Saulsbury

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

A Prefetching Technique for Irregular Accesses to Linked Data Structures.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

Parallel Computer Architecture.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999

An Integrated Path and Timing Analysis Method based on Cycle-Level Symbolic Execution.

[BibT_eX]

[DOI]

Real Time Syst., 1999

Special Issue On Distributed Shared Memory Systems.

[BibT_eX]

[DOI]

Veljko Milutinovic

Proc. IEEE, 1999

Evaluation of Compiler-Controlled Updating to Reduce Coherence-Miss Penalties in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1999

Timing Anomalies in Dynamically Scheduled Microprocessors.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE Real-Time Systems Symposium, 1999

A Method to Improve the Estimated Worst-Case Performance of Data Caching.

[BibT_eX]

[DOI]

Proceedings of the 6th International Workshop on Real-Time Computing and Applications Symposium (RTCSA '99), 1999

1998

Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1998

An evaluation of hardware-based and compiler-controlled optimizations of snooping cache protocols.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 1998

A holistic approach to computer system design education based on system simulation techniques.

[BibT_eX]

[DOI]

Proceedings of the 1998 workshop on Computer architecture education, 1998

SimICS/Sun4m: A Virtual Workstation.

[BibT_eX]

[DOI]

Proceedings of the 1998 USENIX Annual Technical Conference, 1998

Integrating Path and Timing Analysis Using Instruction-Level Simulation Techniques.

[BibT_eX]

[DOI]

Proceedings of the Languages, 1998

1997

Effectivness of Dynamic Prefetching in Multiple-Writer Distributed Virtual Shared-Memory Systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1997

Trends in Shared Memory Multiprocessing.

[BibT_eX]

[DOI]

Computer, 1997

Boosting the Performance of Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

Computer, 1997

Reducing the Read-Miss Penalty for Flat COMA Protocols.

[BibT_eX]

[DOI]

Mårten Björkman

Comput. J., 1997

Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques.

[BibT_eX]

[DOI]

Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

A Performance Tuning Approach for Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996

Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1996

Using Dataflow Analysis Techniques to Reduce Ownership Overhead in Cache Coherence Protocols.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 1996

Characterising and Modelling Shared Memory Accesses in Multiprocessor Programs.

[BibT_eX]

[DOI]

Mats Brorsson

Parallel Comput., 1996

The design of a non-blocking load processor architecture.

[BibT_eX]

[DOI]

Magnus Balldin

Microprocess. Microsystems, 1996

Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1996

Applications for Shared Memory Multiprocessors (Guest Editors' Introduction).

[BibT_eX]

[DOI]

Computer, 1996

Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers.

[BibT_eX]

[DOI]

Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996

1995

Sequential Hardware Prefetching in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1995

Essential Misses and Data Traffic in Coherence Protocols.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1995

Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1995

Implementation and evaluation of update-based cache protocols under relaxed memory consistency models.

[BibT_eX]

[DOI]

Future Gener. Comput. Syst., 1995

Efficient Strategies for Software-Only Protocols in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Effectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Using hints to reduce the read miss penalty for flat COMA protocols.

[BibT_eX]

[DOI]

Mårten Björkman

Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995

A compiler algorithm that reduces read latency in ownership-based cache coherence protocols.

[BibT_eX]

[DOI]

Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994

Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor.

[BibT_eX]

[DOI]

Mats Brorsson

Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994

An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic.

[BibT_eX]

[DOI]

Håkan Nilsson

Proceedings of the PARLE '94: Parallel Architectures and Languages Europe, 1994

Combined Performance Gains of Simple Cache Protocol Extensions.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

An Integrated Methodology for the Verification of Directory-Based Cache Protocols.

[BibT_eX]

[DOI]

Fong Pong

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Reducing the Write Traffic for a Hybrid Cache Protocol.

[BibT_eX]

[DOI]

Proceedings of the 1994 International Conference on Parallel Processing, 1994

Introduction.

[BibT_eX]

Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

Simple Compiler Algorithms to Reduce Ownership Operhead in Cache Coherence Protocols.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VI Proceedings, 1994

1993

An Adaptive Cache Coherence Protocol Optimized for Migratory Sharing.

[BibT_eX]

[DOI]

Mats Brorsson

Lars Sandberg

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

The Detection and Elimination of Useless Misses in Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 1993 International Conference on Parallel Processing, 1993

The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 26th Annual Simulation Symposium, ANSS 1993, 1993

1992

The Scalable Tree Protocol - A Cache Coherence Approach for Large-Scale Multiprocessors.

[BibT_eX]

[DOI]

Håkan Nilsson

Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, 1992

Comparative Performance Evaluation of Cache-Coherent NUMA and COMA Architectures.

[BibT_eX]

[DOI]

Truman Joe

Anoop Gupta

Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

A Latency-Hiding Scheme for Multiprocessors with Buffered Multistage Networks.

[BibT_eX]

[DOI]

Proceedings of the 6th International Parallel Processing Symposium, 1992

1991

On Reconfigurable On-Chip Data Caches.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991

A Lockup-Free Multiprocessor Cache Design.

[BibT_eX]

Lars Lundberg

Proceedings of the International Conference on Parallel Processing, 1991

1990

A Survey of Cache Coherence Schemes for Multiprocessors.

[BibT_eX]

[DOI]

Computer, 1990

1989

A Cache Consistency Protocol for Multiprocessors with Multistage Networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989

1988

Reducing Contention in Sharde-Memory Multiprocessors.

[BibT_eX]

[DOI]

Computer, 1988

1987

A Layered Emulator for Design Evaluation of MIMD Multiprocessors with Shared Memory.

[BibT_eX]

[DOI]