Per Stenström
Orcid: 0000-0002-7441-8245Affiliations:
- Chalmers University of Technology, Goteborg, Sweden
According to our database1,
Per Stenström
authored at least 194 papers
between 1987 and 2024.
Collaborative distances:
Collaborative distances:
Awards
IEEE Fellow
IEEE Fellow 2007, "For contributions to design of high-performance memory systems".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on chalmers.se
-
on orcid.org
-
on id.loc.gov
-
on d-nb.info
On csauthors.net:
Bibliography
2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024
2023
Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints.
ACM Trans. Archit. Code Optim., September, 2023
Proceedings of the IEEE International Symposium on Hardware Oriented Security and Trust, 2023
SoK: Analysis of Root Causes and Defense Strategies for Attacks on Microarchitectural Optimizations.
Proceedings of the 8th IEEE European Symposium on Security and Privacy, 2023
eProcessor: European, Extendable, Energy-Efficient, Extreme-Scale, Extensible, Processor Ecosystem.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023
2022
Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications.
ACM Trans. Archit. Code Optim., 2022
Task-RM: A Resource Manager for Energy Reduction in Task-Parallel Applications under Quality of Service Constraints.
ACM Trans. Archit. Code Optim., 2022
Real Time Syst., 2022
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
2021
ACM Trans. Embed. Comput. Syst., 2021
CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling.
Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques, 2021
2020
Coordinated management of DVFS and cache partitioning under QoS constraints to save energy in multi-core systems.
J. Parallel Distributed Comput., 2020
Coordinated Management of Processor Configuration and Cache Partitioning to Optimize Energy under QoS Constraints.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
DELTA: Distributed Locality-Aware Cache Partitioning for Tile-based Chip Multiprocessors.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020
2019
J. Parallel Distributed Comput., 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
SaC: Exploiting Execution-Time Slack to Save Energy in Heterogeneous Multicore Systems.
Proceedings of the 48th International Conference on Parallel Processing, 2019
2018
IEEE Trans. Parallel Distributed Syst., 2018
ACM Trans. Archit. Code Optim., 2018
ProFess: A Probabilistic Hybrid Main Memory Management Framework for High Performance and Fairness.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2018
2017
SLOOP: QoS-Supervised Loop Execution to Reduce Energy on Heterogeneous Architectures.
ACM Trans. Archit. Code Optim., 2017
A Framework for Automated and Controlled Floating-Point Accuracy Reduction in Graphics Applications on GPUs.
ACM Trans. Archit. Code Optim., 2017
IEEE Comput. Archit. Lett., 2017
Proceedings of the 2017 IEEE Real-Time and Embedded Technology and Applications Symposium, 2017
Proceedings of the International Symposium on Memory Systems, 2017
2016
IEEE Comput. Archit. Lett., 2016
Adaptive Row Addressing for Cost-Efficient Parallel Memory Protocols in Large-Capacity Memories.
Proceedings of the Second International Symposium on Memory Systems, 2016
Proceedings of the 2016 IEEE International Symposium on High Performance Computer Architecture, 2016
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016
2015
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01751-3, 2015
HyComp: a hybrid cache compression method for selection of data-type-specific compression methods.
Proceedings of the 48th International Symposium on Microarchitecture, 2015
Performance Impact of Batching Web-Application Requests Using Hot-Spot Processing on GPUs.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Proceedings of the 44th International Conference on Parallel Processing, 2015
2014
IEEE Trans. Parallel Distributed Syst., 2014
IEEE Trans. Computers, 2014
Introduction to the JPDC special issue on Perspectives on Parallel and Distributed Processing.
J. Parallel Distributed Comput., 2014
Int. J. Parallel Program., 2014
Proceedings of the 20th IEEE Real-Time and Embedded Technology and Applications Symposium, 2014
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Performance and Energy Analysis of the Restricted Transactional Memory Implementation on Haswell.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014
2013
IEEE Trans. Parallel Distributed Syst., 2013
Proceedings of the 42nd International Conference on Parallel Processing, 2013
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013
2012
Introduction to the special issue on high-performance and embedded architectures and compilers.
ACM Trans. Archit. Code Optim., 2012
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Proceedings of the 18th IEEE International Symposium on High Performance Computer Architecture, 2012
Transactional prefetching: narrowing the window of contention in hardware transactional memory.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012
2011
Classification and Elimination of Conflicts in Hardware Transactional Memory Systems.
Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the International Conference on Parallel Processing, 2011
Proceedings of the 14th International Conference on Compilers, 2011
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010
LV*: a class of lazy versioning HTMs for low-cost integration of transactional memory systems.
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010
Characterization and exploitation of narrow-width loads: the narrow-width cache approach.
Proceedings of the 2010 International Conference on Compilers, 2010
2009
J. Signal Process. Syst., 2009
Concurr. Comput. Pract. Exp., 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the High Performance Embedded Architectures and Compilers, 2009
Proceedings of the PACT 2009, 2009
Proceedings of the 8th IEEE/ACIS International Conference on Computer and Information Science, 2009
2008
ACM Trans. Embed. Comput. Syst., 2008
IEEE Trans. Computers, 2008
Early detection and bypassing of trivial operations to improve energy efficiency of processors.
Microprocess. Microsystems, 2008
J. Instr. Level Parallelism, 2008
Dual-thread Speculation: A Simple Approach to Uncover Thread-level Parallelism on a Simultaneous Multithreaded Processor.
Int. J. Parallel Program., 2008
Proceedings of the 2008 International Conference on Embedded Computer Systems: Architectures, 2008
Proceedings of the 9th workshop on MEmory performance, 2008
Intermediate checkpointing with conflicting access prediction in transactional memory systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Accommodation of the Bandwidth of Large Cache Blocks Using Cache/Memory Link Compression.
Proceedings of the 2008 International Conference on Parallel Processing, 2008
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008
2007
Trans. High Perform. Embed. Archit. Compil., 2007
SIGARCH Comput. Archit. News, 2007
An LRU-based replacement algorithm augmented with frequency of access in shared chip-multiprocessor caches.
SIGARCH Comput. Archit. News, 2007
SimWattch: Integrating Complete-System and User-Level Performance and Power Simulators.
IEEE Micro, 2007
J. Syst. Archit., 2007
Energy and Performance Trade-offs between Instruction Reuse and Trivial Computations for Embedded Applications.
Proceedings of the IEEE Second International Symposium on Industrial Embedded Systems, 2007
Proceedings of the 2007 workshop on MEmory performance, 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007
Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007
Proceedings of the Euro-Par 2007, 2007
Proceedings of the 2007 Design, Automation and Test in Europe Conference and Exposition, 2007
Proceedings of the Advances in Computer Systems Architecture, 2007
2006
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006
Reduction of Energy Consumption in Processors by Early Detection and Bypassing of Trivial Operations.
Proceedings of 2006 International Conference on Embedded Computer Systems: Architectures, 2006
Proceedings of the 12th International Symposium on High-Performance Computer Architecture, 2006
Proceedings of the High Performance Computing, 2006
Proceedings of the Third Conference on Computing Frontiers, 2006
Enhancing Last-Level Cache Performance by Block Bypassing and Early Miss Determination.
Proceedings of the Advances in Computer Systems Architecture, 11th Asia-Pacific Conference, 2006
2005
Enhancing Multiprocessor Architecture Simulation Speed Using Matched-Pair Comparison.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the International Conference on Pervasive Services 2005, 2005
Proceedings of the High Performance Embedded Architectures and Compilers, 2005
Proceedings of the Second Conference on Computing Frontiers, 2005
Proceedings of the Second Conference on Computing Frontiers, 2005
2004
A comparative evaluation of hardware-only and software-only directory protocols in shared-memory multiprocessors.
J. Syst. Archit., 2004
Proceedings of the 3rd Workshop on Memory Performance Issues, 2004
Proceedings of the First Conference on Computing Frontiers, 2004
2003
Integrating complete-system and user-level performance/power simulators: the SimWattch approach.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003
Proceedings of the 32nd International Conference on Parallel Processing (ICPP 2003), 2003
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003
Proceedings of the Research and Advanced Technology for Digital Libraries, 2003
2002
Microprocess. Microsystems, 2002
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002
Empirical Observations Regarding Predictability in User Access-Behavior in a Distributed Digital Library System.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
The FAB Predictor: Using Fourier Analysis to Predict the Outcome of Conditional Branches.
Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002
2001
J. Instr. Level Parallelism, 2001
A Case Study of Load Distribution in Parallel View Frustum Culling and Collision Detection.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001
Limits on Speculative Module-Level Parallelism in Imperative and Object-Oriented Programs on CMP Platforms.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001
2000
Comparative Evaluation of Latency-Tolerating and -Reducing Techniques for Hardware-Only and Software-Only Directory Protocols.
J. Parallel Distributed Comput., 2000
Adv. Comput., 2000
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, 2000
Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000
1999
An Integrated Path and Timing Analysis Method based on Cycle-Level Symbolic Execution.
Real Time Syst., 1999
Evaluation of Compiler-Controlled Updating to Reduce Coherence-Miss Penalties in Shared-Memory Multiprocessors.
J. Parallel Distributed Comput., 1999
Proceedings of the 20th IEEE Real-Time Systems Symposium, 1999
Proceedings of the 6th International Workshop on Real-Time Computing and Applications Symposium (RTCSA '99), 1999
1998
Performance Evaluation and Cost Analysis of Cache Protocol Extensions for Shared-Memory Multiprocessors.
IEEE Trans. Computers, 1998
An evaluation of hardware-based and compiler-controlled optimizations of snooping cache protocols.
Future Gener. Comput. Syst., 1998
A holistic approach to computer system design education based on system simulation techniques.
Proceedings of the 1998 workshop on Computer architecture education, 1998
Proceedings of the 1998 USENIX Annual Technical Conference, 1998
Proceedings of the Languages, 1998
1997
Effectivness of Dynamic Prefetching in Multiple-Writer Distributed Virtual Shared-Memory Systems.
J. Parallel Distributed Comput., 1997
Relative Performance of Hardware and Software-Only Directory Protocols Under Latency Tolerating and Reducing Techniques.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997
Proceedings of the Euro-Par '97 Parallel Processing, 1997
1996
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 1996
Using Dataflow Analysis Techniques to Reduce Ownership Overhead in Cache Coherence Protocols.
ACM Trans. Program. Lang. Syst., 1996
Parallel Comput., 1996
Microprocess. Microsystems, 1996
Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection.
J. Parallel Distributed Comput., 1996
Computer, 1996
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers.
Proceedings of the Second International Symposium on High-Performance Computer Architecture, 1996
1995
IEEE Trans. Parallel Distributed Syst., 1995
J. Parallel Distributed Comput., 1995
Using Write Caches to Improve Performance of Cache Coherence Protocols in Shared-Memory Multiprocessors.
J. Parallel Distributed Comput., 1995
Implementation and evaluation of update-based cache protocols under relaxed memory consistency models.
Future Gener. Comput. Syst., 1995
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995
Effectiveness of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995
Proceedings of the 28th Annual Hawaii International Conference on System Sciences (HICSS-28), 1995
A compiler algorithm that reduces read latency in ownership-based cache coherence protocols.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995
1994
Modelling accesses to migratory and producer-consumer characterised data in a shared memory multiprocessor.
Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994
An Adaptive Update-Based Cache Coherence Protocol for Reduction of Miss Rate and Traffic.
Proceedings of the PARLE '94: Parallel Architectures and Languages Europe, 1994
Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994
Proceedings of the 1994 International Conference on Parallel Processing, 1994
Proceedings of the 1994 International Conference on Parallel Processing, 1994
Introduction.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994
Simple Compiler Algorithms to Reduce Ownership Operhead in Cache Coherence Protocols.
Proceedings of the ASPLOS-VI Proceedings, 1994
1993
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993
Proceedings of the 1993 International Conference on Parallel Processing, 1993
The Cachemire Test Bench A Flexible And Effective Approach For Simulation Of Multiprocessors.
Proceedings of the Proceedings 26th Annual Simulation Symposium, ANSS 1993, 1993
1992
The Scalable Tree Protocol - A Cache Coherence Approach for Large-Scale Multiprocessors.
Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, 1992
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992
Proceedings of the 6th International Parallel Processing Symposium, 1992
1991
Proceedings of the 24th Annual IEEE/ACM International Symposium on Microarchitecture, 1991
A Lockup-Free Multiprocessor Cache Design.
Proceedings of the International Conference on Parallel Processing, 1991
1990
1989
Proceedings of the 16th Annual International Symposium on Computer Architecture. Jerusalem, 1989
1988
1987
Proceedings of the PARLE, 1987