Magnus Själander

IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., October, 2024

R-HLS: An IR for Dynamic High-Level Synthesis and Memory Disambiguation based on Regions and State Edges.

[BibT_eX]

[DOI]

David Metz

Nico Reissmann

CoRR, 2024

CATBench: A Compiler Autotuning Benchmarking Suite for Black-box Optimization.

[BibT_eX]

[DOI]

CoRR, 2024

TEEMO: Temperature Aware Energy Efficient Multi-Retention STT-RAM Cache Architecture.

[BibT_eX]

[DOI]

Sukarn Agarwal

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

MAFin: Maximizing Accuracy in FinFET based Approximated Real-Time Computing.

[BibT_eX]

[DOI]

Sangeet Saha

Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

2023

BISDU: A Bit-Serial Dot-Product Unit for Microcontrollers.

[BibT_eX]

[DOI]

David Metz

Vineet Kumar

ACM Trans. Embed. Comput. Syst., September, 2023

Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks.

[BibT_eX]

[DOI]

Christos Sakalis

ACM Trans. Archit. Code Optim., March, 2023

DELICIOUS: Deadline-Aware Approximate Computing in Cache-Conscious Multicore.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., February, 2023

ReCon: Efficient Detection, Management, and Use of Non-Speculative Information Leakage.

[BibT_eX]

[DOI]

Pavlos Aimoniotis

Amund Bergland Kvalsvik

Xiaoyue Chen

Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023

Doppelganger Loads: A Safe, Complexity-Effective Optimization for Secure Speculation Schemes.

[BibT_eX]

[DOI]

Amund Bergland Kvalsvik

Pavlos Aimoniotis

Proceedings of the 50th Annual International Symposium on Computer Architecture, 2023

Architecting Selective Refresh based Multi-Retention Cache for Heterogeneous System (ARMOUR).

[BibT_eX]

[DOI]

Sukarn Agarwal

Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

2022

Data-Out Instruction-In (DOIN!): Leveraging Inclusive Caches to Attack Speculative Delay Schemes.

[BibT_eX]

[DOI]

Pavlos Aimoniotis

Amund Bergland Kvalsvik

Proceedings of the 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED), 2022

STIFF: thermally safe temperature effect inversion aware FinFET based multi-core.

[BibT_eX]

[DOI]

Vassos Soteriou

Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

2021

Prepare: Power-Aware Approximate Real-time Task Scheduling for Energy-Adaptive QoS Maximization.

[BibT_eX]

[DOI]

Sangeet Saha

ACM Trans. Embed. Comput. Syst., 2021

WaFFLe: Gated Cache-Ways with Per-Core Fine-Grained DVFS for Reduced On-Chip Temperature and Leakage Consumption.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2021

"It's a Trap!"-How Speculation Invariance Can Be Abused with Forward Speculative Interference.

[BibT_eX]

[DOI]

CoRR, 2021

Selectively Delaying Instructions to Prevent Microarchitectural Replay Attacks.

[BibT_eX]

[DOI]

Christos Sakalis

CoRR, 2021

On Value Recomputation to Accelerate Invisible Speculation.

[BibT_eX]

[DOI]

CoRR, 2021

Reorder Buffer Contention: A Forward Speculative Interference Attack for Speculation Invariant Instructions.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2021

Seeds of SEED: Preventing Priority Inversion in Instruction Scheduling to Disrupt Speculative Interference.

[BibT_eX]

[DOI]

Christos Sakalis

Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

Do Not Predict - Recompute! How Value Recomputation Can Truly Boost the Performance of Invisible Speculation.

[BibT_eX]

[DOI]

Proceedings of the 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021

2020

Understanding Selective Delay as a Method for Efficient Secure Speculative Execution.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2020

Evaluating the Potential Applications of Quaternary Logic for Approximate Computing.

[BibT_eX]

[DOI]

ACM J. Emerg. Technol. Comput. Syst., 2020

Twig: Multi-Agent Task Management for Colocated Latency-Critical Cloud Services.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High Performance Computer Architecture, 2020

RePAiR: A Strategy for Reducing Peak Temperature while Maximising Accuracy of Approximate Real-Time Computing: Work-in-Progress.

[BibT_eX]

[DOI]

Sangeet Saha

Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis, 2020

Clearing the Shadows: Recovering Lost Performance for Invisible Speculative Execution through HW/SW Co-Design.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing.

[BibT_eX]

[DOI]

ACM Trans. Reconfigurable Technol. Syst., 2019

EPIC: An Energy-Efficient, High-Performance GPGPU Computing Research Infrastructure.

[BibT_eX]

[DOI]

CoRR, 2019

RVSDG: An Intermediate Representation for Optimizing Compilers.

[BibT_eX]

[DOI]

CoRR, 2019

Efficient invisible speculative execution through selective delay and value prediction.

[BibT_eX]

[DOI]

Proceedings of the 46th International Symposium on Computer Architecture, 2019

Improving Memory Access Locality for Vectorized Bit-Serial Matrix Multiplication in Reconfigurable Computing.

[BibT_eX]

[DOI]

Lahiru Rasnayake

Proceedings of the International Conference on Field-Programmable Technology, 2019

Ghost loads: what is the cost of invisible speculation?

[BibT_eX]

[DOI]

Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

2018

Static Instruction Scheduling for High Performance on Limited Hardware.

[BibT_eX]

[DOI]

Vasileios Spiliopoulos

Alexandra Jimborean

IEEE Trans. Computers, 2018

SWOOP: software-hardware co-design for non-speculative, execute-ahead, in-order cores.

[BibT_eX]

[DOI]

Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing.

[BibT_eX]

[DOI]

Yaman Umuroglu

Lahiru Rasnayake

Proceedings of the 28th International Conference on Field Programmable Logic and Applications, 2018

2017

Transcending Hardware Limits with Software Out-of-Order Processing.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2017

Clairvoyance: look-ahead compile-time scheduling.

[BibT_eX]

[DOI]

Vasileios Spiliopoulos

Alexandra Jimborean

Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016

Poster: Approximation: A New Paradigm also for Wireless Sensing.

[BibT_eX]

[DOI]

Thiemo Voigt

Frederik Hermans

Proceedings of the International Conference on Embedded Wireless Systems and Networks, 2016

Practical way halting by speculatively accessing halt tags.

[BibT_eX]

[DOI]

Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Techniques for modulating error resilience in emerging multi-value technologies.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Redesigning a tagless access buffer to require minimal ISA changes.

[BibT_eX]

[DOI]

Proceedings of the 2016 International Conference on Compilers, 2016

2015

Improving Data Access Efficiency by Using Context-Aware Loads and Stores.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Optimizing Transfers of Control in the Static Pipeline Architecture.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM SIGPLAN/SIGBED Conference on Languages, 2015

Scheduling instruction effects for a statically pipelined processor.

[BibT_eX]

[DOI]

Proceedings of the 2015 International Conference on Compilers, 2015

2014

Power-Efficient Computer Architectures: Recent Advances

[BibT_eX]

[DOI]

Margaret Martonosi

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01745-2, 2014

A tunable cache for approximate computing.

[BibT_eX]

[DOI]

Nina Shariati Nilsson

Proceedings of the IEEE/ACM International Symposium on Nanoscale Architectures, 2014

Reducing set-associative L1 data cache energy by early load data dependence detection (ELD<sup>3</sup>).

[BibT_eX]

[DOI]

Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

2013

Reducing instruction fetch energy in multi-issue processors.

[BibT_eX]

[DOI]

Peter Gavin

David B. Whalley

ACM Trans. Archit. Code Optim., 2013

Designing a practical data filter cache to improve both energy efficiency and performance.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2013

FlexCore: Implementing an exposed datapath processor.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Improving processor efficiency by statically pipelining instructions.

[BibT_eX]

[DOI]

Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2013

Speculative tag access for reduced energy dissipation in set-associative L1 data caches.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Improving data access efficiency by using a tagless access buffer (TAB).

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012

Techniques to Measure, Model, and Manage Power.

[BibT_eX]

[DOI]

Bhavishya Goel

Sally A. McKee

Adv. Comput., 2012

Configurable RTL model for level-1 caches.

[BibT_eX]

[DOI]

Proceedings of the NORCHIP 2012, Copenhagen, Denmark, November 12-13, 2012, 2012

An LTE Uplink Receiver PHY benchmark and subframe-based power management.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Viterbi Accelerator for Embedded Processor Datapaths.

[BibT_eX]

[DOI]

Kashan Khurshid Ansari

Proceedings of the 23rd IEEE International Conference on Application-Specific Systems, 2012

2011

Power-Aware Resource Scheduling in Base Stations.

[BibT_eX]

[DOI]

Proceedings of the MASCOTS 2011, 2011

Reconfigurable Instruction Decoding for a Wide-Control-Word Processor.

[BibT_eX]

[DOI]

Alen Bardizbanyan

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

A High-Speed, Energy-Efficient Two-Cycle Multiply-Accumulate (MAC) Architecture and Its Application to a Double-Throughput MAC Unit.

[BibT_eX]

[DOI]

IEEE Trans. Circuits Syst. I Regul. Pap., 2010

Design space exploration for an embedded processor with flexible datapath interconnect.

[BibT_eX]

[DOI]

Ulf Jalmbrant

Erik der Hagopian

Kasyab P. Subramaniyan

Proceedings of the 21st IEEE International Conference on Application-specific Systems Architectures and Processors, 2010

2009

FlexCore: Utilizing Exposed Datapath Control for Efficient Computing.

[BibT_eX]

[DOI]

J. Signal Process. Syst., 2009

Multiplication Acceleration Through Twin Precision.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2009

High-speed, energy-efficient 2-cycle Multiply-Accumulate architecture.

[BibT_eX]

[DOI]

Proceedings of the Annual IEEE International SoC Conference, SoCC 2009, 2009

Scheduling for an Embedded Architecture with a Flexible Datapath.

[BibT_eX]

[DOI]

Thomas Schilling

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2009

Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Custom layout strategy for rectangle-shaped log-depth multiplier reduction tree.

[BibT_eX]

[DOI]

Patrik Kimfors

Niklas Broman

Andreas Haraldsson

Kasyab P. Subramaniyan

Henrik Eriksson

Proceedings of the 16th IEEE International Conference on Electronics, 2009

A Flexible Code Compression Scheme Using Partitioned Look-Up Tables.

[BibT_eX]

[DOI]

Martin Thuresson

Per Stenström

Proceedings of the High Performance Embedded Architectures and Compilers, 2009

2008

Early detection and bypassing of trivial operations to improve energy efficiency of processors.

[BibT_eX]

[DOI]

Md. Mafijul Islam

Per Stenström

Microprocess. Microsystems, 2008

High-speed and low-power multipliers using the Baugh-Wooley algorithm and HPM reduction tree.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International Conference on Electronics, Circuits and Systems, 2008

A Look-Ahead Task Management Unit for Embedded Multi-Core Architectures.

[BibT_eX]

[DOI]

Andrei Sergeevich Terechko

Marc Duranton

Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

2007

A Flexible Datapath Interconnect for Embedded Applications.

[BibT_eX]

[DOI]

Magnus Björk

Proceedings of the 2007 IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2007), 2007

2006

Multiplier reduction tree with logarithmic logic depth and regular connectivity.

[BibT_eX]

[DOI]

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

2005

A low-leakage twin-precision multiplier using reconfigurable power gating.

[BibT_eX]

[DOI]

Mindaugas Drazdziulis

Henrik Eriksson

Proceedings of the International Symposium on Circuits and Systems (ISCAS 2005), 2005

2004

An Efficient Twin-Precision Multiplier.

[BibT_eX]

[DOI]

Henrik Eriksson