Ana Lucia Varbanescu

Sascha Caron

CoRR, 2024

Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies.

[BibT_eX]

[DOI]

Felix Brakel

Uraz Odyurt

CoRR, 2024

Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame.

[BibT_eX]

[DOI]

Jolly Chen

Monica Dessole

CoRR, 2024

Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays.

[BibT_eX]

[DOI]

Proceedings of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

GraphSys-2024: 2nd Workshop on Serverless, Extreme-Scale, and Sustainable Graph Processing Systems.

[BibT_eX]

[DOI]

Alexandru Iosup

Radu Prodan

Proceedings of the Companion of the 15th ACM/SPEC International Conference on Performance Engineering, 2024

Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame.

[BibT_eX]

[DOI]

Jolly Chen

Monica Dessole

Proceedings of the 12th International Workshop on OpenCL and SYCL, 2024

Distillation vs. Sampling for Efficient Training of Learning to Rank Models.

[BibT_eX]

[DOI]

Proceedings of the 2024 ACM SIGIR International Conference on Theory of Information Retrieval, 2024

Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research.

[BibT_eX]

[DOI]

Uraz Odyurt

Sascha Caron

Proceedings of the Computational Science - ICCS 2024, 2024

Analyzing Per-Application Energy Consumption in a Multi-Application Computing Continuum.

[BibT_eX]

[DOI]

Proceedings of the 2024 9th International Conference on Fog and Mobile Edge Computing (FMEC), 2024

2023

Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms.

[BibT_eX]

[DOI]

CoRR, 2023

Graph-Optimizer: Towards Predictable Large-Scale Graph Processing Workloads.

[BibT_eX]

[DOI]

Andrea Bartolini

Proceedings of the Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Systematically Exploring High-Performance Representations of Vector Fields Through Compile-Time Composition.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Graph Greenifier: Towards Sustainable and Energy-Aware Massive Graph Processing in the Computing Continuum.

[BibT_eX]

[DOI]

Proceedings of the Companion of the 2023 ACM/SPEC International Conference on Performance Engineering, 2023

Estimating the Energy Consumption of Applications in the Computing Continuum with iFogSim.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2023

Performance Engineering for Graduate Students: a View from Amsterdam.

[BibT_eX]

[DOI]

Anuj Pathania

Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Analyzing Digital Services Across the Compute Continuum Using iFogSim.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, 2023

MassiveClicks: A Massively-Parallel Framework for Efficient Click Models Training.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28, 2023

The Graph-Massivizer Approach Toward a European Sustainable Data Center Digital Twin.

[BibT_eX]

[DOI]

Proceedings of the 47th IEEE Annual Computers, Software, and Applications Conference, 2023

2022

Future Computer Systems and Networking Research in the Netherlands: A Manifesto.

[BibT_eX]

[DOI]

CoRR, 2022

ParClick: A Scalable Algorithm for EM-based Click Models.

[BibT_eX]

[DOI]

Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Isolating GPU Architectural Features Using Parallelism-Aware Microbenchmarks.

[BibT_eX]

[DOI]

Rico van Stigt

Proceedings of the ICPE '22: ACM/SPEC International Conference on Performance Engineering, Bejing, China, April 9, 2022

The Cost of Reinforcement Learning for Game Engines: The AZ-Hive Case-study.

[BibT_eX]

[DOI]

Danilo de Goede

Duncan Kampert

Proceedings of the ICPE '22: ACM/SPEC International Conference on Performance Engineering, Bejing, China, April 9, 2022

Building a Fine-Grained Analytical Performance Model for Complex Scientific Simulations.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2022

Modelling Performance Loss due to Thread Imbalance in Stochastic Variable-Length SIMT Workloads.

[BibT_eX]

[DOI]

Attila Krasznahorkay

Andy D. Pimentel

Proceedings of the 30th International Symposium on Modeling, 2022

Design-Space Exploration for Decision-Support Software.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022

Heterogeneous GPU and FPGA computing: a VexCL case-study.

[BibT_eX]

[DOI]

Tristan Laan

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Efficient trimming for strongly connected components calculation.

[BibT_eX]

[DOI]

Dante Niewenhuis

José Gabriel de Figueiredo Coutinho

Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

2021

Analytical Performance Estimation for Large-Scale Reconfigurable Dataflow Platforms.

[BibT_eX]

[DOI]

Ryota Yasudo

ACM Trans. Reconfigurable Technol. Syst., 2021

The future is big graphs: a community view on graph processing systems.

[BibT_eX]

[DOI]

Commun. ACM, 2021

Mimicking the Human Approach in the Game of Hive.

[BibT_eX]

[DOI]

Duncan Kampert

Matthias Müller-Brockhausen

Aske Plaat

Proceedings of the IEEE Symposium Series on Computational Intelligence, 2021

2020

Designing and building application-centric parallel memories.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

A Sampling-Based Tool for Scaling Graph Datasets.

[BibT_eX]

[DOI]

Proceedings of the ICPE '20: ACM/SPEC International Conference on Performance Engineering, 2020

DDLBench: Towards a Scalable Benchmarking Infrastructure for Distributed Deep Learning.

[BibT_eX]

[DOI]

Matthijs Jansen

Valeriu Codreanu

Proceedings of the Fourth IEEE/ACM Workshop on Deep Learning on Supercomputers, 2020

μ-Genie: A Framework for Memory-Aware Spatial Processor Architecture Co-Design Exploration.

[BibT_eX]

[DOI]

Proceedings of the 23rd Euromicro Conference on Digital System Design, 2020

2019

Scalability model for the LOFAR direction independent pipeline.

[BibT_eX]

[DOI]

Astron. Comput., 2019

2018

HLS Support for Polymorphic Parallel Memories.

[BibT_eX]

[DOI]

Luca Stornaiuolo

Marco Rabozzi

Donatella Sciuto

Proceedings of the IFIP/IEEE International Conference on Very Large Scale Integration, 2018

Building High-Performance, Easy-to-Use Polymorphic Parallel Memories with HLS.

[BibT_eX]

[DOI]

Luca Stornaiuolo

Marco Rabozzi

Donatella Sciuto

Proceedings of the VLSI-SoC: Design and Engineering of Electronics Systems Based on New Computing Paradigms, 2018

A Beginner's Guide to Estimating and Improving Performance Portability.

[BibT_eX]

[DOI]

Henk Dreuning

Roel Heirman

Proceedings of the High Performance Computing, 2018

Mix-and-Match: A Model-Driven Runtime Optimisation Strategy for BFS on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2018

EXTRA: an open platform for reconfigurable architectures.

[BibT_eX]

[DOI]

Dionisios N. Pnevmatikatos

Grigorios Chrysos

Charalampos Vatsolakis

Georgios Charitopoulos

Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018

MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Performance Estimation for Exascale Reconfigurable Dataflow Platforms.

[BibT_eX]

[DOI]

Ryota Yasudo

José Gabriel F. Coutinho

Proceedings of the International Conference on Field-Programmable Technology, 2018

Performance Prediction for Large-Scale Heterogeneous Platforms.

[BibT_eX]

[DOI]

Ryota Yasudo

José Gabriel F. Coutinho

Wayne Luk

Hideharu Amano

Proceedings of the 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2018

Towards Application-Centric Parallel Memories.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2018: Parallel Processing Workshops, 2018

Exploring HPC and Big Data Convergence: A Graph Processing Study on Intel Knights Landing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach.

[BibT_eX]

[DOI]

CoRR, 2017

A Performance-centric Approach for Complex Decision Support.

[BibT_eX]

[DOI]

Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017

A NoC-based custom FPGA configuration memory architecture for ultra-fast micro-reconfiguration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2017

2016

Workload Partitioning for Accelerating Applications on Heterogeneous Platforms.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

The landscape of GPGPU performance modeling tools.

[BibT_eX]

[DOI]

Parallel Comput., 2016

Dynamic Load Balancing for High-Performance Graph Processing on Hybrid CPU-GPU Platforms.

[BibT_eX]

[DOI]

Stijn Heldens

Alexandru Iosup

Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

EXTRA: Towards the exploitation of eXascale technology for reconfigurable architectures.

[BibT_eX]

[DOI]

Dirk Stroobandt

Dionisios N. Pnevmatikatos

Amit Kulkarni

Elias Vansteenkiste

Wayne Luk

Proceedings of the 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip, 2016

A Tool for Bottleneck Analysis and Performance Prediction for GPU-Accelerated Applications.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Heterogeneous computing with accelerators: an overview with examples.

[BibT_eX]

[DOI]

Jie Shen

Proceedings of the 2016 Forum on Specification and Design Languages, 2016

Synthetic Graph Generation for Systematic Exploration of Graph Structural Properties.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Speed-Up Computational Finance Simulations with OpenCL on Intel Xeon Phi.

[BibT_eX]

[DOI]

Michail Papadimitriou

Joris Cramwinckel

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Towards the Next Generation of Large-Scale Network Archives.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Using colored petri nets for GPGPU performance modeling.

[BibT_eX]

[DOI]

Souley Madougou

Proceedings of the ACM International Conference on Computing Frontiers, CF'16, 2016

Design and Experimental Evaluation of Distributed Heterogeneous Graph-Processing Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

2015

Evaluating vector data type usage in OpenCL kernels.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2015

Can Portability Improve Performance?: An Empirical Study of Parallel Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the 6th ACM/SPEC International Conference on Performance Engineering, Austin, TX, USA, January 31, 2015

Computing the Pseudo-Inverse of a Graph's Laplacian Using GPUs.

[BibT_eX]

[DOI]

Nishant Saurabh

Gyan Ranjan

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Proceedings of the 44th International Conference on Parallel Processing, 2015

Quantifying the Performance Impact of Graph Structure on Neighbour Iteration Strategies for PageRank.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

Towards Community Detection on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Stijn Heldens

Arnau Prat-Pérez

Josep Lluís Larriba-Pey

Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

FiNS: A Framework for Accelerating Nested Simulations on Heterogeneous Platforms.

[BibT_eX]

[DOI]

Joris Cramwinckel

Stefan Singor

Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

EXTRA: Towards an Efficient Open Platform for Reconfigurable High Performance Computing.

[BibT_eX]

[DOI]

Dionisios N. Pnevmatikatos

George Charitopoulos

Xinyu Niu

Wayne Luk

Proceedings of the 18th IEEE International Conference on Computational Science and Engineering, 2015

Fast packet forwarding engine based on software circuits.

[BibT_eX]

[DOI]

Marc X. Makkes

Cees T. A. M. de Laat

Robert J. Meijer

Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

Improving Application Performance by Efficiently Utilizing Heterogeneous Many-core Platforms.

[BibT_eX]

[DOI]

Jie Shen

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

An Empirical Performance Evaluation of GPU-Enabled Graph-Processing Systems.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE/ACM International Symposium on Cluster, 2015

2014

Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Florian Rathgeber

Gheorghe-Teodor Bercea

J. Ramanujam

David A. Ham

Paul H. J. Kelly

ACM Trans. Archit. Code Optim., 2014

Aristotle: A performance impact indicator for the OpenCL kernels using local memory.

[BibT_eX]

[DOI]

Sci. Program., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.

[BibT_eX]

[DOI]

Fabio Luporini

Horacio Emilio Pérez Sánchez

Florian Rathgeber

Gheorghe-Teodor Bercea

J. Ramanujam

David A. Ham

Paul H. J. Kelly

CoRR, 2014

Benchmarking graph-processing platforms: a vision.

[BibT_eX]

[DOI]

Proceedings of the ACM/SPEC International Conference on Performance Engineering, 2014

Test-driving Intel Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the ACM/SPEC International Conference on Performance Engineering, 2014

Towards Benchmarking IaaS and PaaS Clouds for Graph Analytics.

[BibT_eX]

[DOI]

Proceedings of the Big Data Benchmarking - 5th International Workshop, 2014

Parallel Computation of Non-Bonded Interactions in Drug Discovery: Nvidia GPUs vs. Intel Xeon Phi.

[BibT_eX]

[DOI]

Proceedings of the International Work-Conference on Bioinformatics and Biomedical Engineering, 2014

How Well Do Graph-Processing Platforms Perform? An Empirical Performance Evaluation and Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Improving performance by matching imbalanced workloads with heterogeneous platforms.

[BibT_eX]

[DOI]

Proceedings of the 2014 International Conference on Supercomputing, 2014

Grover: Looking for Performance Improvement by Disabling Local Memory Usage in OpenCL Kernels.

[BibT_eX]

[DOI]

Proceedings of the 43rd International Conference on Parallel Processing, 2014

Look before You Leap: Using the Right Hardware Resources to Accelerate Applications.

[BibT_eX]

[DOI]

Jie Shen

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

Optimizing a Calibration Software for Radio Astronomy.

[BibT_eX]

[DOI]

Souley Madougou

Rob van Nieuwpoort

Proceedings of the 2014 IEEE International Conference on High Performance Computing and Communications, 2014

An Empirical Evaluation of GPGPU Performance Models.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

KMA: A Dynamic Memory Manager for OpenCL.

[BibT_eX]

[DOI]

Proceedings of the Seventh Workshop on General Purpose Processing Using GPUs, 2014

2013

An application-centric evaluation of OpenCL on multi-core CPUs.

[BibT_eX]

[DOI]

Parallel Comput., 2013

An Empirical Study of Intel Xeon Phi.

[BibT_eX]

[DOI]

CoRR, 2013

Performance Traps in OpenCL for CPUs.

[BibT_eX]

[DOI]

Proceedings of the 21st Euromicro International Conference on Parallel, 2013

ELMO: A User-Friendly API to Enable Local Memory in OpenCL Kernels.

[BibT_eX]

[DOI]

Proceedings of the 21st Euromicro International Conference on Parallel, 2013

Topic 9: Parallel and Distributed Programming - (Introduction).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013 Parallel Processing, 2013

Glinda: a framework for accelerating imbalanced applications on heterogeneous platforms.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2013

Sesame: A User-Transparent Optimizing Framework for Many-Core Processors.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013

2012

Parallel application characterization with quantitative metrics.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2012

Radio Astronomy Beam Forming on Many-Core Architectures.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Performance Gaps between OpenMP and OpenCL for Multi-core CPUs.

[BibT_eX]

[DOI]

Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

Accelerating Cost Aggregation for Real-Time Stereo Matching.

[BibT_eX]

[DOI]

Laurens van der Maaten

Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

2011

Towards an Effective Unified Programming Model for Many-Cores.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

OCL-BodyScan: A Case Study for Application-centric Programming of Many-Core Processors.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

A Comprehensive Performance Comparison of CUDA and OpenCL.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

An Auto-tuning Solution to Data Streams Clustering in OpenCL.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International Conference on Computational Science and Engineering, 2011

2010

On the effective parallel programming of multi-core processors.

[BibT_eX]

[DOI]

PhD thesis, 2010

Performance Impact of Task Mapping on the Cell BE Multicore Processor.

[BibT_eX]

[DOI]

Jörg Keller

Proceedings of the Computer Architecture, 2010

2009

Building high-resolution sky images using the Cell/B.E.

[BibT_eX]

[DOI]

Sci. Program., 2009

Evaluating application mapping scenarios on the Cell/B.E.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2009

Introduction to Mastering Cell BE and GPU Execution Platforms.

[BibT_eX]

[DOI]

Ed F. Deprettere

Proceedings of the Embedded Computer Systems: Architectures, 2009

Evaluating multi-core platforms for HPC data-intensive kernels.

[BibT_eX]

[DOI]

Rob van Nieuwpoort

Proceedings of the 6th Conference on Computing Frontiers, 2009

2008

Radioastronomy Image Synthesis on the Cell/B.E..

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2008, 2008

2007

Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

An Effective Strategy for Porting C++ Applications on Cell.

[BibT_eX]

[DOI]

Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Digital Media Indexing on the Cell Processor.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

2006

SP@CE - An SP-Based Programming Model for Consumer Electronics Streaming Applications.

[BibT_eX]

[DOI]

Arturo González-Escribano

Maik Nijhuis

Herbert Bos

Henri E. Bal

Proceedings of the Languages and Compilers for Parallel Computing, 2006

PAM-SoC: A Toolchain for Predicting MPSoC Performance.

[BibT_eX]

[DOI]