Stefanos Kaxiras

Orcid: 0000-0001-8267-0232

Affiliations:
  • Uppsala University


According to our database1, Stefanos Kaxiras authored at least 130 papers between 1991 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
TaDA: Task Decoupling Architecture for the Battery-less Internet of Things.
Proceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems, 2024

A First Exploration of Fine-Grain Coherence for Integrity Metadata.
Proceedings of the International Symposium on Secure and Private Execution Environment Design, 2024

2023
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks.
ACM Trans. Archit. Code Optim., March, 2023

Speculative inter-thread store-to-load forwarding in SMT architectures.
J. Parallel Distributed Comput., March, 2023

ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage.
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes.
Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

How addresses are made.
Proceedings of the IEEE International Symposium on Workload Characterization, 2023

Silent Stores in the Battery-less Internet of Things: A Good Idea?
Proceedings of the 2023 International Conference on embedded Wireless Systems and Networks, 2023

2022
Analysing software prefetching opportunities in hardware transactional memory.
J. Supercomput., 2022

Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes.
Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

Free atomics: hardware atomic operations without fences.
Proceedings of the ISCA '22: The 49th Annual International Symposium on Computer Architecture, New York, New York, USA, June 18, 2022

Splash-4: A Modern Benchmark Suite with Lock-Free Constructs.
Proceedings of the IEEE International Symposium on Workload Characterization, 2022

Clueless: A Tool Characterising Values Leaking as Addresses.
Proceedings of the 11th International Workshop on Hardware and Architectural Support for Security and Privacy, 2022

2021
Early Address Prediction: Efficient Pipeline Prefetch and Reuse.
ACM Trans. Archit. Code Optim., 2021

"It's a Trap!"-How Speculation Invariance Can Be Abused with Forward Speculative Interference.
CoRR, 2021

Selectively Delaying Instructions to Prevent Microarchitectural Replay Attacks.
CoRR, 2021

On Value Recomputation to Accelerate Invisible Speculation.
CoRR, 2021

Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions.
IEEE Comput. Archit. Lett., 2021

Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference.
Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

Do Not Predict - Recompute! How Value Recomputation Can Truly Boost the Performance of Invisible Speculation.
Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous Multithreading.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Splash-4: Improving Scalability with Lock-Free Constructs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

TSOPER: Efficient Coherence-Based Strict Persistency.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Understanding Selective Delay as a Method for Efficient Secure Speculative Execution.
IEEE Trans. Computers, 2020

Evaluating the Potential Applications of Quaternary Logic for Approximate Computing.
ACM J. Emerg. Technol. Comput. Syst., 2020

Speculative Enforcement of Store Atomicity.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Boosting Store Buffer Efficiency with Store-Prefetch Bursts.
Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture, 2020

Delay and Bypass: Ready and Criticality Aware Instruction Scheduling in Out-of-Order Processors.
Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Maximizing Limited Resources: a Limit-Based Study and Taxonomy of Out-of-Order Commit.
J. Signal Process. Syst., 2019

Efficient invisible speculative execution through selective delay and value prediction.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

Filter caching for free: the untapped potential of the store-buffer.
Proceedings of the 46th International Symposium on Computer Architecture, 2019

FIFOrder MicroArchitecture: Ready-Aware Instruction Scheduling for OoO Processors.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Ghost loads: what is the cost of invisible speculation?
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018
Automatic Detection of Large Extended Data-Race-Free Regions with Conflict Isolation.
IEEE Trans. Parallel Distributed Syst., 2018

Static Instruction Scheduling for High Performance on Limited Hardware.
IEEE Trans. Computers, 2018

Non-Speculative Load Reordering in Total Store Ordering.
IEEE Micro, 2018

Mending Fences with Self-Invalidation and Self-Downgrade.
Log. Methods Comput. Sci., 2018

SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

The Superfluous Load Queue.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

Non-Speculative Store Coalescing in Total Store Order.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

Dynamically Disabling Way-prediction to Reduce Instruction Replay.
Proceedings of the 36th IEEE International Conference on Computer Design, 2018

2017
Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics.
IEEE Trans. Parallel Distributed Syst., 2017

Decoupled Access-Execute on ARM big.LITTLE.
CoRR, 2017

Transcending Hardware Limits with Software Out-of-Order Processing.
IEEE Comput. Archit. Lett., 2017

Addressing Energy Challenges in Filter Caches.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

A taxonomy of out-of-order instruction commit.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

Non-Speculative Load-Load Reordering in TSO.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Clairvoyance: look-ahead compile-time scheduling.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

Automatic detection of extended data-race-free regions.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

Exploring the Performance Limits of Out-of-order Commit.
Proceedings of the Computing Frontiers Conference, 2017

2016
Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead.
ACM Trans. Archit. Code Optim., 2016

Profiling-Assisted Decoupled Access-Execute.
CoRR, 2016

Racer: TSO consistency via race detection.
Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016

Splash-3: A properly synchronized benchmark suite for contemporary research.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Fencing Programs with Self-Invalidation and Self-Downgrade.
Proceedings of the Formal Techniques for Distributed Objects, Components, and Systems, 2016

Techniques for modulating error resilience in emerging multi-value technologies.
Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Multiversioned decoupled access-execute: the key to energy-efficient compilation of general-purpose programs.
Proceedings of the 25th International Conference on Compiler Construction, 2016

POSTER: Efficient Self-Invalidation/Self-Downgrade for Critical Sections with Relaxed Semantics.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
The Effects of Granularity and Adaptivity on Private/Shared Classification for Coherence.
ACM Trans. Archit. Code Optim., 2015

Callback: efficient synchronization without invalidation with a directory just for spin-waiting.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

The load slice core microarchitecture.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Full Speed Ahead: Detailed Architectural Simulation at Near-Native Speed.
Proceedings of the 2015 IEEE International Symposium on Workload Characterization, 2015

Turning Centralized Coherence and Distributed Critical-Section Execution on their Head: A New Approach for Scalable Distributed Shared Memory.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

Hierarchical private/shared classification: The key to simple and efficient coherence for clustered cache hierarchies.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Fast&Furious: A Tool for Detecting Covert Racing.
Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015

An Efficient, Self-Contained, On-chip Directory: DIR1-SISD.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Power-Efficient Computer Architectures: Recent Advances
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01745-2, 2014

Managing power constraints in a single-core scenario through power tokens.
J. Supercomput., 2014

A tunable cache for approximate computing.
Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2014

Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

2013
Efficient inter-core power and thermal balancing for multicore processors.
Computing, 2013

Introducing DVFS-Management in a Full-System Simulator.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

A new perspective for efficient virtual-cache coherence.
Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013

Towards more efficient execution: a decoupled access-execute approach.
Proceedings of the International Conference on Supercomputing, 2013

2012
Efficient, snoopless, System-on-Chip coherence.
Proceedings of the IEEE 25th International SOC Conference, 2012

A framework for efficient cache resizing.
Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

Power-Sleuth: A Tool for Investigating Your Program's Power Behavior.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Embedded reconfigurable architectures.
Proceedings of the 15th International Conference on Compilers, 2012

Complexity-effective multicore coherence.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Leakage-efficient design of value predictors through state and non-state preserving techniques.
J. Supercomput., 2011

Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Poster: DVFS management in real-processors.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Green governors: A framework for Continuously Adaptive DVFS.
Proceedings of the 2011 International Green Computing Conference and Workshops, 2011

Token3D: Reducing Temperature in 3D Die-Stacked CMPs through Cycle-Level Power Control Mechanisms.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Multicore Cache Simulations Using Heterogeneous Computing on General Purpose and Graphics Processors.
Proceedings of the 14th Euromicro Conference on Digital System Design, 2011

2010
SARC Coherence: Scaling Directory Cache Coherence in Performance and Power.
IEEE Micro, 2010

Teaching Introduction to Computing Through a Project-Based Collaborative Learning Approach.
Proceedings of the 14th Panhellenic Conference on Informatics, 2010

Interval-based models for run-time DVFS orchestration in superscalar processors.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Where replacement algorithms fail: a thorough analysis.
Proceedings of the 7th Conference on Computing Frontiers, 2010

MLP-Aware Instruction Queue Resizing: The Key to Power-Efficient Performance.
Proceedings of the Architecture of Computing Systems, 2010

2009
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches.
Trans. High Perform. Embed. Archit. Compil., 2009

Instruction-based reuse-distance prediction for effective cache management.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Efficient microarchitecture policies for accurately adapting to power constraints.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Instruction Precomputation for Fault Detection.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

2008
Computer Architecture Techniques for Power-Efficiency
Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01721-6, 2008

Non deterministic caches: a simple and effective defense against side channel attacks.
Des. Autom. Embed. Syst., 2008

Low power microarchitecture with instruction reuse.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Cache replacement based on reuse-distance prediction.
Proceedings of the 25th International Conference on Computer Design, 2007

Applying Decay to Reduce Dynamic Power in Set-Associative Caches.
Proceedings of the High Performance Embedded Architectures and Compilers, 2007

Using value locality to reduce memory encryption overhead in embedded processors.
Proceedings of 12th IEEE International Conference on Emerging Technologies and Factory Automation, 2007

Adaptive VP decay: making value predictors leakage-efficient designs for high performance processors.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
Preventing Denial-of-Service Attacks in Shared CMP Caches.
Proceedings of the Embedded Computer Systems: Architectures, 2006

Modeling Cache Sharing on Chip Multiprocessor Architectures.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Topic 18: Embedded Parallel Systems.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Dynamic Dictionary-Based Data Compression for Level-1 Caches.
Proceedings of the Architecture of Computing Systems, 2006

2005
A simple mechanism to adapt leakage-control policies to temperature.
Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005

IPStash: a set-associative memory approach for efficient IP-lookup.
Proceedings of the INFOCOM 2005. 24th Annual Joint Conference of the IEEE Computer and Communications Societies, 2005

2004
Implementing branch-predictor decay using quasi-static memory cells.
ACM Trans. Archit. Code Optim., 2004

4T-decay sensors: a new class of small, fast, robust, and low-power, temperature/leakage sensors.
Proceedings of the 2004 International Symposium on Low Power Electronics and Design, 2004

2003
IPStash: a Power-Efficient Memory Architecture for IP-lookup.
Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

TCP: Tag Correlating Prefetchers.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2002
Let caches decay: reducing leakage energy via exploitation of cache generational behavior.
ACM Trans. Comput. Syst., 2002

Implementing Decay Techniques using 4T Quasi-Static Memory Cells.
IEEE Comput. Archit. Lett., 2002

Managing leakage for transient data: decay and quasi-static 4T memory cells.
Proceedings of the 2002 International Symposium on Low Power Electronics and Design, 2002

Timekeeping in the Memory System: Predicting and Optimizing Memory Behavior.
Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

2001
Cache decay: exploiting generational behavior to reduce cache leakage power.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads.
Proceedings of the 2001 International Conference on Compilers, 2001

2000
Distributed vector architectures.
J. Syst. Archit., 2000

Cache-Line Decay: A Mechanism to Reduce Cache Leakage Power.
Proceedings of the Power-Aware Computer Systems, First International Workshop, 2000

Coherence Communication Prediction in Shared-Memory Multiprocessors.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

1999
DataScalar: A memory-centric approach to computing.
J. Syst. Archit., 1999

Improving CC-NUMA Performance Using Instruction-Based Prediction.
Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

1998
A Study of Three Dynamic Approaches to Handle Widely Shared Data in Shared-memory Multiprocessors.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
DataScalar Architectures.
Proceedings of the 24th International Symposium on Computer Architecture, 1997

1996
Kiloprocessor Extensions to SCI.
Proceedings of IPPS '96, 1996

The GLOW Cache Coherence Protocol Extensions for Widely Shared Data.
Proceedings of the 10th international conference on Supercomputing, 1996

1992
PSM: software tool for simulating, prototyping, and monitoring of multiprocessor systems.
Inf. Softw. Technol., 1992

1991
A Prolog-based design environment for the high-level synthesis of application-specific architectures.
Microprocessing and Microprogramming, 1991


  Loading...