Erik Hagersten

  • Uppsala University, Sweden

According to our database1, Erik Hagersten authored at least 76 papers between 1989 and 2019.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


Online presence:



Directed Statistical Warming through Time Traveling.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Tail-PASS: Resource-Based Cache Management for Tiled Graphics Rendering Hardware.
Proceedings of the IEEE International Conference on Parallel & Distributed Processing with Applications, 2018

Exploring Scheduling Effects on Task Performance with TaskInsight.
Supercomput. Front. Innov., 2017

Understanding the interplay between task scheduling, memory and performance.
Proceedings of the Proceedings Companion of the 2017 ACM SIGPLAN International Conference on Systems, 2017

A graphics tracing framework for exploring CPU+GPU memory systems.
Proceedings of the 2017 IEEE International Symposium on Workload Characterization, 2017

A Split Cache Hierarchy for Enabling Data-Oriented Optimizations.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

POSTER: Putting the G back into GPU/CPU Systems Research.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

Analytical Processor Performance and Power Modeling Using Micro-Architecture Independent Characteristics.
IEEE Trans. Computers, 2016

Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead.
ACM Trans. Archit. Code Optim., 2016

CoolSim: Statistical techniques to replace cache warming with efficient, virtualized profiling.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

CoolSim: Eliminating traditional cache warming with fast, virtualized profiling.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Message from the general chair.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Data placement across the cache hierarchy: Minimizing data movement with reuse-aware placement.
Proceedings of the 34th IEEE International Conference on Computer Design, 2016

Formalizing Data Locality in Task Parallel Applications.
Proceedings of the Algorithms and Architectures for Parallel Processing, 2016

The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence.
ACM Trans. Archit. Code Optim., 2015

Long term parking (LTP): criticality-aware resource allocation in OOO processors.
Proceedings of the 48th International Symposium on Microarchitecture, 2015

Micro-architecture independent analytical processor performance and power modeling.
Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software, 2015

Cost-effective speculative scheduling in high performance processors.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

StatTask: reuse distance analysis for task-based applications.
Proceedings of the 2015 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2015

AREP: Adaptive Resource Efficient Prefetching for Maximizing Multicore Performance.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

An Efficient, Self-Contained, On-chip Directory: DIR1-SISD.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Resource conscious prefetching for irregular applications in multicores.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

Extending statistical cache models to support detailed pipeline simulators.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

A software based profiling method for obtaining speedup stacks on commodity multi-cores.
Proceedings of the 2014 IEEE International Symposium on Performance Analysis of Systems and Software, 2014

Navigating the cache hierarchy with a single lookup.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

A Case for Resource Efficient Prefetching in Multicores.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

TLC: a tag-less cache for reducing dynamic first level cache energy.
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013

Modeling performance variation due to cache sharing.
Proceedings of the 19th IEEE International Symposium on High Performance Computer Architecture, 2013

Bandwidth Bandit: Quantitative characterization of memory contention.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Low Overhead Instruction-Cache Modeling Using Instruction Reuse Profiles.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

The HOPSA Workflow and Tools.
Proceedings of the Tools for High Performance Computing 2012, 2012

Bandwidth bandit: Understanding memory contention.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Phase behavior in serial and parallel applications.
Proceedings of the 2012 IEEE International Symposium on Workload Characterization, 2012

Phase guided profiling for fast cache modeling.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Efficient techniques for predicting cache sharing and throughput.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

Efficient software-based online phase classification.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

Cache Pirating: Measuring the Curse of the Shared Cache.
Proceedings of the International Conference on Parallel Processing, 2011

Fast modeling of shared caches in multicore systems.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses.
Proceedings of the Conference on High Performance Computing Networking, 2010

StatStack: Efficient modeling of LRU caches.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

StatCC: a statistical cache contention model.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Reconsidering algorithms for iterative solvers in the multicore era.
Int. J. Comput. Sci. Eng., 2009

Improving Cache Utilization Using Acumem VPE.
Proceedings of the Tools for High Performance Computing, 2008

A case for low-complexity MP architectures.
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007

Conserving Memory Bandwidth in Chip Multiprocessors with Runahead Execution.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

A statistical multiprocessor cache model.
Proceedings of the 2006 IEEE International Symposium on Performance Analysis of Systems and Software, 2006

Exploiting locality: a flexible DSM approach.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Modeling Cache Sharing on Chip Multiprocessor Architectures.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

TMA: a trap-based memory architecture.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Fast data-locality profiling of native execution.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2005

VASA: A Simulator Infrastructure with Adjustable Fidelity.
Proceedings of the International Conference on Parallel and Distributed Computing Systems, 2005

Exploring Processor Design Options for Java-Based Middleware.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Skewed caches from a low-power perspective.
Proceedings of the Second Conference on Computing Frontiers, 2005

StatCache: a probabilistic approach to efficient and accurate data locality analysis.
Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software, 2004

Bundling: Reducing the Overhead of Multiprocessor Prefetchers.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Exploiting Spatial Store Locality Through Permission Caching in Software DSMs.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Miss Penalty Reduction Using Bundled Capacity Prefetching in Multiprocessors.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Hierarchical Backoff Locks for Nonuniform Communication Architectures.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Memory System Behavior of Java-Based Middleware.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

THROOM - Supporting POSIX Multithreaded Binaries on a Cluster.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Efficient synchronization for nonuniform communication architectures.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

SIP: Performance Tuning through Source Code Interdependence.
Proceedings of the Euro-Par 2002, 2002

Removing the overhead from software-based shared memory.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Shared-memory multiprocessing: Current state and future directions.
Adv. Comput., 2000

High-Performance Computers: Yesterday, Today, and Tomorrow.
Proceedings of the Applied Parallel Computing, 2000

Parallel computing in the commercial marketplace: research and innovation at work.
Proc. IEEE, 1999

WildFire: A Scalable Path for SMPs.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Trends in Shared Memory Multiprocessing.
Computer, 1997

Queue Locks on Cache Coherent Multiprocessors.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

Simple COMA Node Implementations.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

Simulating the Data Diffusion Machine.
Proceedings of the PARLE '93, 1993

DDM - A Cache-Only Memory Architecture.
Computer, 1992

Race-Free Interconnection Networks and Multiprocessor Consistency.
Proceedings of the 18th Annual International Symposium on Computer Architecture. Toronto, 1991

The Cache Coherence Protocol of the Data Diffusion Machine.
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989
