Georgios I. Goumas

Orcid: 0000-0001-7811-4831

Affiliations:
  • National Technical University of Athens (NTUA), Greece


According to our database1, Georgios I. Goumas authored at least 99 papers between 2000 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
eBPF-mm: Userspace-guided memory management in Linux with eBPF.
CoRR, 2024

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures.
CoRR, 2024

Elastic Translations: Fast Virtual Memory with Multiple Translation Sizes.
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024

Uncut-GEMMs: Communication-Aware Matrix Multiplication on Multi-GPU Nodes.
Proceedings of the IEEE International Conference on Cluster Computing, 2024

Open-Source SpMV Multiplication Hardware Accelerator for FPGA-Based HPC Systems.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2024

2023
PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems.
ACM Trans. Archit. Code Optim., December, 2023

High-performance and balanced parallel graph coloring on multicore platforms.
J. Supercomput., April, 2023

DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems.
Proc. ACM Meas. Anal. Comput. Syst., March, 2023

Architectural Support for Efficient Data Movement in Disaggregated Systems.
CoRR, 2023

DaeMon: Architectural Support for Efficient Data Movement in Disaggregated Systems.
CoRR, 2023

Architectural Support for Efficient Data Movement in Fully Disaggregated Systems.
Proceedings of the Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2023

Feature-based SpMV Performance Analysis on Contemporary Devices.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Invited paper: An Artificial Matrix Generator for Multi-platform SpMV Performance Analysis.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

2022
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proc. ACM Meas. Anal. Comput. Syst., 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems.
CoRR, 2022

FaaS in the age of (sub-)<i>μs</i> I/O: a performance analysis of snapshotting.
Proceedings of the SYSTOR '22: The 15th ACM International Systems and Storage Conference, Haifa, Israel, June 13, 2022

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the SIGMETRICS/PERFORMANCE '22: ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, Mumbai, India, June 6, 2022

DaxVM: Stressing the Limits of Memory as a File Interface.
Proceedings of the 55th IEEE/ACM International Symposium on Microarchitecture, 2022

SparseP: Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2022


Deverlay: Container Snapshots For Virtual Machines.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
RCU-HTM: A generic synchronization technique for highly efficient concurrent search trees.
Concurr. Comput. Pract. Exp., 2021

Modeling the Scalability of the EuroExa Reconfigurable Accelerators - Preliminary Results - Invited Paper.
Proceedings of the Embedded Computer Systems: Architectures, Modeling, and Simulation, 2021

CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2021

Online Weight Pruning Via Adaptive Sparsity Loss.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2021

2020
Weight Pruning via Adaptive Sparsity Loss.
CoRR, 2020

Leveraging Blockchain Technology to Break the Cloud Computing Market Monopoly.
Comput., 2020

Efficient Concurrent Range Queries in B+-trees using RCU-HTM.
Proceedings of the SPAA '20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, 2020

Enhancing and Exploiting Contiguity for Fast Memory Virtualization.
Proceedings of the 47th ACM/IEEE Annual International Symposium on Computer Architecture, 2020

2019
Efficient accelerator sharing in virtualized environments: A Xeon Phi use-case.
J. Syst. Softw., 2019

Building Ad-Hoc Clouds with CloudAgora.
Proceedings of the 38th Symposium on Reliable Distributed Systems, 2019

Conflict-free symmetric sparse matrix-vector multiplication on multicore architectures.
Proceedings of the International Conference for High Performance Computing, 2019

BASMAT: bottleneck-aware sparse matrix-vector multiplication auto-tuning on GPGPUs.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

On the Performance and Energy Efficiency of Sparse Matrix-Vector Multiplication on FPGAs.
Proceedings of the Parallel Computing: Technology Trends, 2019

ACTiManager: An end-to-end interference-aware cloud resource manager.
Proceedings of the 20th International Middleware Conference Demos and Posters, 2019

DICER: Diligent Cache Partitioning for Efficient Workload Consolidation.
Proceedings of the 48th International Conference on Parallel Processing, 2019

CloudAgora: Democratizing the Cloud.
Proceedings of the Blockchain - ICBC 2019, 2019

An adaptive concurrent priority queue for NUMA architectures.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

RecNets: Channel-wise Recurrent Convolutional Neural Networks.
Proceedings of the 30th British Machine Vision Conference 2019, 2019

2018
SparseX: A Library for High-Performance Sparse Matrix-Vector Multiplication on Multicore Platforms.
ACM Trans. Math. Softw., 2018

A distributed modular platform for the development of cloud based applications.
Future Gener. Comput. Syst., 2018

Combining HTM with RCU to Speed Up Graph Coloring on Multicore Platforms.
Proceedings of the High Performance Computing - 33rd International Conference, 2018

Efficient resource management for data centers: the ACTiCLOUD approach.
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

RACCEX: Towards Remote Accelerated Computing Environments.
Proceedings of the 2018 IEEE International Conference on Cloud Computing Technology and Science, 2018

Performance Prediction of NUMA Placement: A Machine-Learning Approach.
Proceedings of the 2018 IEEE International Conference on Cloud Computing Technology and Science, 2018

2017
Predictive communication modeling for HPC applications.
Clust. Comput., 2017

An efficient and fair scheduling policy for multiprocessor platforms.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2017

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors.
Proceedings of the 46th International Conference on Parallel Processing, 2017


Improving QoS and Utilisation in modern multi-core servers with Dynamic Cache Partitioning.
Proceedings of the Joined Workshops COSH 2017 and VisorHPC 2017, 2017

BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited paper.
Proceedings of the Computing Frontiers Conference, 2017

RCU-HTM: Combining RCU with HTM to Implement Highly Efficient Concurrent Binary Search Trees.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Improving virtual host efficiency through resource and interference aware scheduling.
CoRR, 2016

Massively Concurrent Red-Black Trees with Hardware Transactional Memory.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Contention-Aware Scheduling Policies for Fairness and Throughput.
Proceedings of the Co-Scheduling of HPC Applications [extended versions of all papers from COSH@HiPEAC 2016, 2016

A resource-centric Application Classification Approach.
Proceedings of the 1st COSH Workshop on Co-Scheduling of HPC Applications, 2016

2015
A lightweight optimization selection method for Sparse Matrix-Vector Multiplication.
CoRR, 2015

A Machine-Learning Approach for Communication Prediction of Large-Scale Applications.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

CIRANO: An Integrated Programming Environment for Multi-tier Cloud Based Applications.
Proceedings of the 1st International Conference on Cloud Forward: From Distributed to Complete Computing, 2015

2014
LCA: a memory link and cache-aware co-scheduling approach for CMPs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication.
IEEE Trans. Parallel Distributed Syst., 2013

Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
User Adaptation in a Hybrid MT System - Feeding User Corrections into Synchronous Grammars and System Dictionaries.
Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

Using State-of-the-Art Sparse Matrix Optimizations for Accelerating the Performance of Multiphysics Simulations.
Proceedings of the Applied Parallel and Scientific Computing, 2012

2011
CSX: an extended compression format for spmv on shared memory systems.
Proceedings of the 16th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2011

2010
Exploiting compression opportunities to improve SpMxV performance on shared memory systems.
ACM Trans. Archit. Code Optim., 2010

Solving the advection PDE on the cell broadband engine.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Exploring I/O Virtualization Data Paths for MPI Applications in a Cluster of VMs: A Networking Perspective.
Proceedings of the Euro-Par 2010 Parallel Processing Workshops, 2010

2009
Communication-Aware Supernode Shape.
IEEE Trans. Parallel Distributed Syst., 2009

Performance evaluation of the sparse matrix-vector multiplication on modern architectures.
J. Supercomput., 2009

DIANA-microT web server: elucidating microRNA functions through target prediction.
Nucleic Acids Res., 2009

Accurate microRNA target prediction correlates with protein repression levels.
BMC Bioinform., 2009

Exploring the effect of block shapes on the performance of sparse kernels.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Early experiences on accelerating Dijkstra's algorithm using transactional memory.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Employing Transactional Memory and Helper Threads to Speedup Dijkstra's Algorithm.
Proceedings of the ICPP 2009, 2009

Perfomance Models for Blocked Sparse Matrix-Vector Multiplication Kernels.
Proceedings of the ICPP 2009, 2009

GridNews: A distributed automatic Greek broadcast transcription system.
Proceedings of the IEEE International Conference on Acoustics, 2009

A Comparative Study of Blocking Storage Methods for Sparse Matrices on Multicore Architectures.
Proceedings of the 12th IEEE International Conference on Computational Science and Engineering, 2009

Overlapping computation and communication in SMT clusters with commodity interconnects.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

2008
Understanding the Performance of Sparse Matrix-Vector Multiplication.
Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Evaluation of dynamic scheduling methods in simulations of storm-time ion acceleration.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Improving the Performance of Multithreaded Sparse Matrix-Vector Multiplication Using Index and Value Compression.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

Optimizing sparse matrix-vector multiplication using index and value compression.
Proceedings of the 5th Conference on Computing Frontiers, 2008

2007
Coarse-grain Parallel Execution for 2-dimensional PDE Problems.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006
Message-passing code generation for non-rectangular tiling transformations.
Parallel Comput., 2006

Selecting the tile shape to reduce the total communication volume.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2004
Automatic parallel code generation for tiled nested loops.
Proceedings of the 2004 ACM Symposium on Applied Computing (SAC), 2004

2003
An Efficient Code Generation Technique for Tiled Iteration Spaces.
IEEE Trans. Parallel Distributed Syst., 2003

A pipelined schedule to minimize completion time for loop tiling with computation and communication overlapping.
J. Parallel Distributed Comput., 2003

Delivering High Performance to Parallel Applications Using Advanced Scheduling.
Proceedings of the Parallel Computing: Software Technology, 2003

2002
Code Generation Methods for Tiling Transformations .
J. Inf. Sci. Eng., 2002

Automatic code generation for executing tiled nested loops onto parallel architectures.
Proceedings of the 2002 ACM Symposium on Applied Computing (SAC), 2002

Data Parallel Code Generation for Arbitrarily Tiled Loop Nests.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Compiling Tiled Iteration Spaces for Clusters.
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

2000
Evaluation of Loop Grouping Methods Based on Orthogonal Projection Spaces.
Proceedings of the 2000 International Conference on Parallel Processing, 2000


  Loading...