Márcio Castro

Orcid: 0000-0002-9992-8540

Affiliations:
  • Federal University of Santa Catarina (UFSC), Informatics and Statistics Department


According to our database1, Márcio Castro authored at least 55 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Improving edge AI for industrial IoT applications with distributed learning using consensus.
Des. Autom. Embed. Syst., March, 2024

Enabling the execution of HPC applications on public clouds with <i>HPC@Cloud</i> toolkit.
Concurr. Comput. Pract. Exp., 2024

2023
Improving concurrency and memory usage in distributed operating systems for lightweight manycores via cooperative time-sharing lightweight tasks.
J. Parallel Distributed Comput., April, 2023

LWMPI: An MPI library for NoC-based lightweight manycore processors with on-chip memory constraints.
Concurr. Comput. Pract. Exp., 2023

A Performance Comparison of HPC Workloads on Traditional and Cloud-Based HPC Clusters.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

Message from the Workshop Organizers WCC 2023.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

2022
Distributed Learning using Consensus on Edge AI.
Proceedings of the XII Brazilian Symposium on Computing Systems Engineering, 2022

Strategies for Fault-Tolerant Tightly-Coupled HPC Workloads Running on Low-Budget Spot Cloud Infrastructures.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

2021
ARTful: A model for user-defined schedulers targeting multiple high-performance computing runtime systems.
Softw. Pract. Exp., 2021

Dynamic power management under the RUN scheduling algorithm: a slack filling approach.
Real Time Syst., 2021

Inter-kernel communication facility of a distributed operating system for NoC-based lightweight manycores.
J. Parallel Distributed Comput., 2021

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization.
J. Parallel Distributed Comput., 2021

Co-Designing Clusters of Lightweight Manycores and Asymmetric Operating System Kernels.
IEEE Embed. Syst. Lett., 2021

A Task-based Execution Engine for Distributed Operating Systems Tailored to Lightweight Manycores with Limited On-Chip Memory.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

A trace-driven methodology to evaluate and optimize memory management services of distributed operating systems for lightweight manycores.
Proceedings of the SAC '21: The 36th ACM/SIGAPP Symposium on Applied Computing, 2021

2020
Adaptive Load Balancing based on Machine Learning for Iterative Parallel Applications.
Proceedings of the 28th Euromicro International Conference on Parallel, 2020

2019
Real-time video denoising on multicores and GPUs with Kalman-based and Bilateral filters fusion.
J. Real Time Image Process., 2019

Foreword to the special issue of the workshop on high performance computing systems (XVIII Simpósio em Sistemas Computacionais de Alto Desempenho, WSCAD 2017).
Concurr. Comput. Pract. Exp., 2019

A comprehensive performance evaluation of the BinLPT workload-aware loop scheduler.
Concurr. Comput. Pract. Exp., 2019

On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores.
Proceedings of the IX Brazilian Symposium on Computing Systems Engineering, 2019

Distributed Memory Graph Representation for Load Balancing Data: Accelerating Data Structure Generation for Decentralized Scheduling.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

2018
Reducing Global Schedulers Complexity through Runtime System Decoupling.
Proceedings of the Symposium on High Performance Computing Systems, 2018

A Batch Task Migration Approach for Decentralized Global Rescheduling.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Energy Efficient Stencil Computations on the Low-Power Manycore MPPA-256 Processor.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

2017
CAP Bench: a benchmark suite for performance and energy evaluation of low-power many-core processors.
Concurr. Comput. Pract. Exp., 2017

Design methodology for workload-aware loop scheduling strategies based on genetic algorithm and simulation.
Concurr. Comput. Pract. Exp., 2017

Using the Nanvix Operating System in Undergraduate Operating System Courses.
Proceedings of the VII Brazilian Symposium on Computing Systems Engineering, 2017

Towards the Use of LITMUS RT as a Testbed for Multiprocessor Scheduling in Energy Harvesting Real-Time Systems.
Proceedings of the VII Brazilian Symposium on Computing Systems Engineering, 2017

Automatic Partitioning of Stencil Computations on Heterogeneous Systems.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

Extending OpenACC for Efficient Stencil Code Generation and Execution by Skeleton Frameworks.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Enabling efficient stencil code generation in OpenACC.
Proceedings of the International Conference on Computational Science, 2017

Assessing the Performance of the SRR Loop Scheduler with Irregular Workloads.
Proceedings of the International Conference on Computational Science, 2017

Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning.
Proceedings of the International Conference on Computational Science, 2017

Provisioning and Delivering Sepsis Data Supported by an Enhanced SDN Environment.
Proceedings of the 30th IEEE International Symposium on Computer-Based Medical Systems, 2017

2016
Seismic wave propagation simulations on low-power and performance-centric manycores.
Parallel Comput., 2016

Exploiting parallelism to speed up circuit legalization.
Proceedings of the 2016 IEEE International Conference on Electronics, Circuits and Systems, 2016

A Low-Cost Energy-Efficient Raspberry Pi Cluster for Data Mining Algorithms.
Proceedings of the Euro-Par 2016: Parallel Processing Workshops, 2016

Exploration of Load Balancing Thresholds to Save Energy on Iterative Applications.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

2015
On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms.
J. Parallel Distributed Comput., 2015

Performance/energy trade-off in scientific computing: the case of ARM big.LITTLE and Intel Sandy Bridge.
IET Comput. Digit. Tech., 2015

2014
Adaptive thread mapping strategies for transactional memory applications.
J. Parallel Distributed Comput., 2014

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications.
Int. J. Parallel Program., 2014

Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Evaluating the Impact of Transactional Characteristics on the Performance of Transactional Memory Applications.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

Saving energy by exploiting residual imbalances on iterative applications.
Proceedings of the 21st International Conference on High Performance Computing, 2014

2013
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application.
Proceedings of the 3rd Workshop on Irregular Applications - Architectures and Algorithms, 2013

2012
Optimisation de la performance des applications de mémoire transactionnelle sur des plates-formes multicoeurs : une approche basée sur l'apprentissage automatique. (Improving the Performance of Transactional Memory Applications on Multicores : A Machine Learning-based Approach).
PhD thesis, 2012

Dynamic Thread Mapping Based on Machine Learning for Transactional Memory Applications.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Analysis and Tracing of Applications Based on Software Transactional Memory on Multicore Architectures.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

A machine learning-based approach for thread mapping on transactional memory applications.
Proceedings of the 18th International Conference on High Performance Computing, 2011

2010
Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

2009
Memory Affinity for Hierarchical Shared Memory Multiprocessors.
Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009

NUMA-ICTM: A parallel version of ICTM exploiting memory placement strategies for NUMA machines.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2006
High performance XSL-FO rendering for variable data printing.
Proceedings of the 2006 ACM Symposium on Applied Computing (SAC), 2006

2005
A Parallel Version for the Propagation Algorithm.
Proceedings of the Parallel Computing Technologies, 2005


  Loading...