Sotirios Xydis

Orcid: 0000-0003-3151-2730

According to our database1, Sotirios Xydis authored at least 134 papers between 2007 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:



AI-Driven QoS-Aware Scheduling for Serverless Video Analytics at the Edge.
Inf., August, 2024

Orchestration Extensions for Interference- and Heterogeneity-Aware Placement for Data-Analytics.
Int. J. Parallel Program., August, 2024

Cometes: Cross-Device Mapping for Energy and Time-Aware Deployment on Edge Infrastructures.
IEEE Embed. Syst. Lett., June, 2024

CollectiveHLS: Ultrafast Knowledge-Based HLS Design Optimization.
IEEE Embed. Syst. Lett., June, 2024

Sparkle: Deep Learning Driven Autotuning for Taming High-Dimensionality of Spark Deployments.
IEEE Trans. Cloud Comput., 2024

Dataflow Optimized Reconfigurable Acceleration for FEM-based CFD Simulations.
CoRR, 2024

SLO-aware GPU Frequency Scaling for Energy Efficient LLM Inference Serving.
CoRR, 2024

Leveraging Core and Uncore Frequency Scaling for Power-Efficient Serverless Workflows.
CoRR, 2024

Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi-Pumped Soft SIMD Operations.
CoRR, 2024

SLO-Aware GPU DVFS for Energy-Efficient LLM Inference Serving.
IEEE Comput. Archit. Lett., 2024

GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction.
Proceedings of the 51st ACM/IEEE Annual International Symposium on Computer Architecture, 2024

Disaggregated RDDs: Extending and Analyzing Apache Spark for Memory Disaggregated Infrastructures.
Proceedings of the IEEE International Conference on Cloud Engineering, 2024

Dynamic Frequency Boosting of RISC-V FPSoCs Through Monitoring Runtime Path Activations.
Proceedings of the 27th Euromicro Conference on Digital System Design, 2024

Auto-tuning Multi-GPU High-Fidelity Numerical Simulations for Urban Air Mobility.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Decoupled Access-Execute Enabled DVFS for TinyML Deployments on STM32 Microcontrollers.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2024

Late Breaking Results: Language-level QoR modeling for High-Level Synthesis.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Data-driven HLS optimization for reconfigurable accelerators.
Proceedings of the 61st ACM/IEEE Design Automation Conference, 2024

Viewing Allocators as Bin Packing Solvers Demystifies Fragmentation.
CoRR, 2023

DVFaaS: Leveraging DVFS for FaaS Workflows.
IEEE Comput. Archit. Lett., 2023

Beyond RSS: Towards Intelligent Dynamic Memory Management (Work in Progress).
Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, 2023

The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract.
Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management, 2023

Adrias: Interference-Aware Memory Orchestration for Disaggregated Cloud Infrastructures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2023

Profile-Driven Banded Smith-Waterman acceleration for Short Read Alignment.
Proceedings of the 60th ACM/IEEE Design Automation Conference, 2023

Darly: Deep Reinforcement Learning for QoS-aware scheduling under resource heterogeneity Optimizing serverless video analytics.
Proceedings of the 16th IEEE International Conference on Cloud Computing, 2023

Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order Execution.
IEEE Trans. Parallel Distributed Syst., 2022

Enabling Large Scale Simulations for Particle Accelerators.
IEEE Trans. Parallel Distributed Syst., 2022

GANDAFL: Dataflow Acceleration for Short Read Alignment on NGS Data.
IEEE Trans. Computers, 2022

High Level Synthesis Acceleration of Change Detection in Multi-Temporal High Resolution Sentinel-2 Satellite Images.
Proceedings of the 30th IFIP/IEEE 30th International Conference on Very Large Scale Integration, 2022

MAx-DNN: Multi-Level Arithmetic Approximation for Energy-Efficient DNN Hardware Accelerators.
Proceedings of the 13th IEEE Latin America Symposium on Circuits and System, 2022

Dynamic Frequency Boosting Beyond Critical Path Delay.
Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design, 2022

Sequence Clock: A Dynamic Resource Orchestrator for Serverless Architectures.
Proceedings of the IEEE 15th International Conference on Cloud Computing, 2022

Rusty: Runtime Interference-Aware Predictive Monitoring for Modern Multi-Tenant Systems.
IEEE Trans. Parallel Distributed Syst., 2021

FaaS and Curious: Performance Implications of Serverless Functions on Edge Computing Platforms.
Proceedings of the High Performance Computing - ISC High Performance Digital 2021 International Workshops, Frankfurt am Main, Germany, June 24, 2021

FADE: FaaS-inspired application decomposition and Energy-aware function placement on the Edge.
Proceedings of the SCOPES '21: 24th International Workshop on Software and Compilers for Embedded Systems, Eindhoven, The Netherlands, November 1, 2021

Interference-Aware Workload Placement for Improving Latency Distribution of Converged HPC/Big Data Cloud Infrastructures.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

Resource Aware GPU Scheduling in Kubernetes Infrastructure.
Proceedings of the 12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2021

FPGA Acceleration of Short Read Alignment.
Proceedings of the HEART '21: 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2021

FPGA acceleration in EVOLVE's Converged Cloud-HPC Infrastructure.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

Performance Analysis and Auto-tuning for SPARK in-memory analytics.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2021

Interference-Aware Orchestration in Kubernetes.
Proceedings of the High Performance Computing, 2020

Exploration of GPU sharing policies under GEMM workloads.
Proceedings of the SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems, 2020

Resource-Aware MapReduce Runtime for Multi/Many-core Architectures.
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020

DDOT: Data Driven Online Tuning for energy efficient acceleration.
Proceedings of the 57th ACM/IEEE Design Automation Conference, 2020

Scale-out beam longitudinal dynamics simulations.
Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

<i>Oops</i>: Optimizing Operation-mode Selection for IoT Edge Devices.
ACM Trans. Internet Techn., 2019

Multi-Level Approximate Accelerator Synthesis Under Voltage Island Constraints.
IEEE Trans. Circuits Syst. II Express Briefs, 2019

Workload- and process-variation aware voltage/frequency tuning for energy efficient performance sustainability of NTC manycores.
Integr., 2019

Energy-efficient VLSI implementation of multipliers with double LSB operands.
IET Circuits Devices Syst., 2019

Rusty: Runtime System Predictability Leveraging LSTM Neural Networks.
IEEE Comput. Archit. Lett., 2019

LOOG: Improving GPU Efficiency With Light-Weight Out-Of-Order Execution.
IEEE Comput. Archit. Lett., 2019

TF2FPGA: A Framework for Projecting and Accelerating Tensorflow CNNs on FPGA Platforms.
Proceedings of the 8th International Conference on Modern Circuits and Systems Technologies, 2019

Dataflow Acceleration of Smith-Waterman with Traceback for High Throughput Next Generation Sequencing.
Proceedings of the 29th International Conference on Field Programmable Logic and Applications, 2019

Co-design Implications of Cost-effective On-demand Acceleration for Cloud Healthcare Analytics: The AEGLE approach.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Cooperative Arithmetic-Aware Approximation Techniques for Energy-Efficient Multipliers.
Proceedings of the 56th Annual Design Automation Conference 2019, 2019

VOSsim: A Framework for Enabling Fast Voltage Overscaling Simulation for Approximate Computing Circuits.
IEEE Trans. Very Large Scale Integr. Syst., 2018

Application-Arrival Rate Aware Distributed Run-Time Resource Management for Many-Core Computing Platforms.
IEEE Trans. Multi Scale Comput. Syst., 2018

OpenCL-based Virtual Prototyping and Simulation of Many-Accelerator Architectures.
ACM Trans. Embed. Comput. Syst., 2018

Distributed Trade-Based Edge Device Management in Multi-Gateway IoT.
ACM Trans. Cyber Phys. Syst., 2018

Walking through the Energy-Error Pareto Frontier of Approximate Multipliers.
IEEE Micro, 2018

Decoupled MapReduce for Shared-Memory Multi-Core Architectures.
IEEE Comput. Archit. Lett., 2018

CF-TUNE: Collaborative Filtering Auto-Tuning for Energy Efficient Many-Core Processors.
IEEE Comput. Archit. Lett., 2018

BLonD++: performance analysis and optimizations for enabling complex, accurate and fast beam dynamics studies.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

An Exploration Framework for Efficient High-Level Synthesis of Support Vector Machines: Case Study on ECG Arrhythmia Detection for Xilinx Zynq SoC.
J. Signal Process. Syst., 2017

SoftRM: Self-Organized Fault-Tolerant Resource Management for Failure Detection and Recovery in NoC Based Many-Cores.
ACM Trans. Embed. Comput. Syst., 2017

AEGLE's Cloud Infrastructure for Resource Monitoring and Containerized Accelerated Analytics.
Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

Dataflow Acceleration of scikit-learn Gaussian Process Regression.
Proceedings of the 8th Workshop and 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms, 2017

Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation.
IEEE Trans. Very Large Scale Integr. Syst., 2016

Flexible DSP Accelerator Architecture Exploiting Carry-Save Arithmetic.
IEEE Trans. Very Large Scale Integr. Syst., 2016

A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis.
ACM Trans. Embed. Comput. Syst., 2016

An Integrated Exploration and Virtual Platform Framework for Many-Accelerator Heterogeneous Systems.
ACM Trans. Embed. Comput. Syst., 2016

Effective Learning and Filtering of Faulty Heart-Beats for Advanced ECG Arrhythmia Detection using MIT-BIH Database.
EAI Endorsed Trans. Pervasive Health Technol., 2016

Computation offloading and resource allocation for low-power IoT edge devices.
Proceedings of the 3rd IEEE World Forum on Internet of Things, 2016

Performance-power exploration of software-defined big data analytics: The AEGLE cloud backend.
Proceedings of the International Conference on Embedded Computer Systems: Architectures, 2016

Throughput balancing for energy efficient near-threshold manycores.
Proceedings of the 26th International Workshop on Power and Timing Modeling, 2016

Energy profile analysis of Zynq-7000 programmable SoC for embedded medical processing: Study on ECG arrhythmia detection.
Proceedings of the 26th International Workshop on Power and Timing Modeling, 2016

Deploying and monitoring hadoop MapReduce analytics on single-chip cloud computer.
Proceedings of the 7th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 5th Workshop on Design Tools and Architectures For Multicore Embedded Computing Platforms, 2016

Distributed QoS management for internet of things under resource constraints.
Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, 2016

Variability-Aware Voltage Island Management for Near-Threshold Computing with Performance Guarantees.
Proceedings of the Near Threshold Computing, Technology, Methods and Applications., 2016

GENESIS: Parallel Application Placement onto Reconfigurable Architectures (Invited for the Special Issue on Runtime Management).
ACM Trans. Embed. Comput. Syst., 2015

SPIRIT: Spectral-Aware Pareto Iterative Refinement Optimization for Supervised High-Level Synthesis.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2015

Mitigating Memory-Induced Dark Silicon in Many-Accelerator Architectures.
IEEE Comput. Archit. Lett., 2015

A virtual platform for exploring hierarchical interconnection for many-accelerator systems.
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Hybrid approximate multiplier architectures for improved power-accuracy trade-offs.
Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design, 2015

Approximate Multiplier Architectures Through Partial Product Perforation: Power-Area Tradeoffs Analysis.
Proceedings of the 25th edition on Great Lakes Symposium on VLSI, GLVLSI 2015, Pittsburgh, PA, USA, May 20, 2015

Rapid prototyping and Design Space Exploration methodologies for many-accelerator systems.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

High-Level-Synthesis extensions for scalable Single-Chip Many-Accelerators on FPGAs.
Proceedings of the 25th International Conference on Field Programmable Logic and Applications, 2015

Job-Arrival Aware Distributed Run-Time Resource Management on Intel SCC Manycore Platform.
Proceedings of the 13th IEEE International Conference on Embedded and Ubiquitous Computing, 2015

SWAN-iCARE Project: On the Efficiency of FPGAs Emulating Wearable Medical Devices for Wound Management and Monitoring.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

Reconfigurable Computing for Analytics Acceleration of Big Bio-Data: The AEGLE Approach.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

Dynamic Memory Management in Vivado-HLS for Scalable Many-Accelerator Architectures.
Proceedings of the Applied Reconfigurable Computing - 11th International Symposium, 2015

An Optimized Modified Booth Recoder for Efficient Design of the Add-Multiply Operator.
IEEE Trans. Circuits Syst. I Regul. Pap., 2014

Systematic Design and Evaluation of Reconfigurable Arithmetic Components in the Deep submicron Domain.
J. Circuits Syst. Comput., 2014

Co-design of many-accelerator heterogeneous systems exploiting virtual platforms.
Proceedings of the XIVth International Conference on Embedded Computer Systems: Architectures, 2014

A HW/SW framework emulating wearable devices for remote wound monitoring and management.
Proceedings of the 4th International Conference on Wireless Mobile Communication and Healthcare: "Transforming healthcare through innovations in mobile and wireless technologies", 2014

SWAN-iCare project: Towards smart wearable and autonomous negative pressure device for wound monitoring and therapy.
Proceedings of the 4th International Conference on Wireless Mobile Communication and Healthcare: "Transforming healthcare through innovations in mobile and wireless technologies", 2014

Hardware accelerated rician denoise algorithm for high performance magnetic resonance imaging.
Proceedings of the 4th International Conference on Wireless Mobile Communication and Healthcare: "Transforming healthcare through innovations in mobile and wireless technologies", 2014

Patient expectations "vis-à-vis" an innovative remote therapeutic device: Case of chronic wounds in diabetic patients.
Proceedings of the 4th International Conference on Wireless Mobile Communication and Healthcare: "Transforming healthcare through innovations in mobile and wireless technologies", 2014

Effective Platform-Level Exploration for Heterogeneous Multicores Exploiting Simulation-Induced Slacks.
Proceedings of the 5th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 3rd Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2014

Voltage island management in near threshold manycore architectures to mitigate dark silicon.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Variation-aware voltage island formation for power efficient near-threshold manycore architectures.
Proceedings of the 19th Asia and South Pacific Design Automation Conference, 2014

A framework for Compiler Level statistical analysis over customized VLIW architecture.
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013

A meta-model assisted coprocessor synthesis framework for compiler/architecture parameters customization.
Proceedings of the Design, Automation and Test in Europe, 2013

Thermal-aware datapath merging for coarse-grained reconfigurable processors.
Proceedings of the Design, Automation and Test in Europe, 2013

A Systematic Methodology for Reliability Improvements on SoC-Based Software Defined Radio Systems.
VLSI Design, 2012

Compiler-in-the-loop exploration during datapath synthesis for higher quality delay-area trade-offs.
ACM Trans. Design Autom. Electr. Syst., 2012

High Performance and Area Efficient Flexible DSP Datapath Synthesis.
IEEE Trans. Very Large Scale Integr. Syst., 2011

Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations.
IEEE Embed. Syst. Lett., 2011

Thermal optimization for micro-architectures through selective block replication.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Runtime Tuning of Dynamic Memory Management For Mitigating Footprint-Fragmentation Variations.
Proceedings of the ARCS 2011, 2011

Custom multi-threaded Dynamic Memory Management for Multiprocessor System-on-Chip platforms.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Efficient High Level Synthesis Exploration Methodology Combining Exhaustive and Gradient-Based Pruned Searching.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2010

A High Level Synthesis Exploration Framework with Iterative Design Space Partitioning.
Proceedings of the VLSI 2010 Annual Symposium - Selected papers, 2010

High-Level Synthesis Methodologies for Delay-Area Optimized Coarse-Grained Reconfigurable Coprocessor Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2010

Designing efficient DSP datapaths through compiler-in-the-loop exploration methodology.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

Construction of dual mode components for reconfiguration aware high-level synthesis.
Proceedings of the Design, Automation and Test in Europe, 2010

Designing coarse-grain reconfigurable architectures by inlining flexibility into custom arithmetic data-paths.
Integr., 2009

High-level synthesis with coarse grain reconfigurable components.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A design methodology for high-performance and low-leakage fixed-point transpose FIR filters.
Proceedings of the 16th IEEE International Conference on Electronics, 2009

Optimized Reconfigurable RTL Components for Performance Improvements During High-Level Synthesis.
Proceedings of the 12th Euromicro Conference on Digital System Design, 2009

Flexible Datapath Synthesis through Arithmetically Optimized Operation Chaining.
Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems, 2009

A flexible architecture for DSP applications combining high performance arithmetic with small scale configurability.
Proceedings of the 2008 16th European Signal Processing Conference, 2008

A Scheduling Postprocessor to Exploit Morphable RTL Components During High-Level Synthesis.
Proceedings of the 11th Euromicro Conference on Digital System Design: Architectures, 2008

Mapping DSP Applications onto High-Performance Architectural Templates with Inlined Flexibility.
Proceedings of the NASA/ESA Conference on Adaptive Hardware and Systems, 2008

Flexibility Inlining into Arithmetic Data-paths Exploiting A Regular Interconnection Scheme.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

Run-time reconfigurable solutions for adaptive control applications.
Proceedings of the ICINCO 2007, 2007

A regular interconnection scheme for efficient mapping of DSP kernels into reconfigurable hardware.
Proceedings of the 15th European Signal Processing Conference, 2007

High-level synthesis heuristics for run-time reconfigurable architectures.
Proceedings of the 15th European Signal Processing Conference, 2007

A Reconfigurable Arithmetic Data-path Based On Regular Interconnection.
Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007), 2007
