Philippe Olivier Alexandre Navaux

Orcid: 0000-0002-9957-5861

  • Federal University of Rio Grande do Sul, Porto Alegre, Brazil

According to our database1, Philippe Olivier Alexandre Navaux authored at least 282 papers between 1979 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of two.



In proceedings 
PhD thesis 


Online presence:



HBPB, applying reuse distance to improve cache efficiency proactively.
J. Parallel Distributed Comput., 2024

Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures.
CoRR, 2024

Leveraging Cloud Computing for Stock Market Forecasting with Reinforcement Learning.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing, 2024

BTO, Block and Thread Optimization of GPU Kernels on Geophysical Exploration.
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024

Interleaved Execution of Approximated CUDA Kernels in Iterative Applications.
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024

Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC Systems.
Proceedings of the 21st ACM International Conference on Computing Frontiers, 2024

Uncovering I/O demands on HPC platforms: Peeking under the hood of Santos Dumont.
J. Parallel Distributed Comput., December, 2023

Challenges in High-Performance Computing.
J. Braz. Comput. Soc., February, 2023

Smart resource allocation of concurrent execution of parallel applications.
Concurr. Comput. Pract. Exp., 2023

Mitigating execution unit contention in parallel applications using instruction-aware mapping.
Concurr. Comput. Pract. Exp., 2023

Evaluation Model and Performance Analysis of NIC Aggregations in Containerized Private Clouds.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

Harnessing Cloud Computing for Geophysical Exploration.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads.
Proceedings of the 35th IEEE International Symposium on Computer Architecture and High Performance Computing, 2023

Message from the Workshop Organizers WCC 2023.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops , 2023

Deploying Deep Learning Models Using Serverless Computing for Diabetic Retinopathy Detection.
Proceedings of the Computational Science and Its Applications - ICCSA 2023 Workshops, 2023

Accelerating Deep Learning Model Training on Cloud Tensor Processing Unit.
Proceedings of the 13th International Conference on Cloud Computing and Services Science, 2023

The Impact of CUDA Execution Configuration Parameters on the Performance and Energy of a Seismic Application.
Proceedings of the High Performance Computing - 10th Latin American Conference, 2023

Towards a Multi-GPU Implementation of a Seismic Application.
Proceedings of the High Performance Computing - 10th Latin American Conference, 2023

Terminator: A Secure Coprocessor to Accelerate Real-Time AntiViruses Using Inspection Breakpoints.
ACM Trans. Priv. Secur., 2022

Optimizing the EDP of OpenMP applications via concurrency throttling and frequency boosting.
J. Syst. Archit., 2022

Parallel Performance and I/O Profiling of HPC RNA-Seq Applications.
Computación y Sistemas, 2022

Janus: a framework to boost HPC applications in the cloud based on SDN path provisioning.
Clust. Comput., 2022

Edge Computing versus Cloud Computing: Impact on Retinal Image Pre-processing.
Proceedings of the International Symposium on Computer Architecture and High Performance Computing Workshops, 2022

Avoiding Unnecessary Caching with History-Based Preemptive Bypassing.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Hyperparameter Optimization for Convolutional Neural Networks with Genetic Algorithms and Bayesian Optimization.
Proceedings of the 2022 IEEE Latin American Conference on Computational Intelligence (LA-CCI), 2022

Investigating Oil and Gas CSEM Application on Vector Architectures.
Proceedings of the Computational Science and Its Applications - ICCSA 2022 Workshops, 2022

Impact of Reduced and Mixed-Precision on the Efficiency of a Multi-GPU Platform on CFD Applications.
Proceedings of the Computational Science and Its Applications - ICCSA 2022 Workshops, 2022

ParslRNA-Seq: An Efficient and Scalable RNAseq Analysis Workflow for Studies of Differentiated Gene Expression.
Proceedings of the High Performance Computing - 9th Latin American Conference, 2022

Companion for "Temporal Load Imbalance on Ondes3D Seismic Simulator for Different Multicore Architectures".
Dataset, February, 2021

Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit.
ACM Trans. Model. Perform. Evaluation Comput. Syst., 2021

Thermal neutrons: a possible threat for supercomputer reliability.
J. Supercomput., 2021

Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures.
J. Parallel Distributed Comput., 2021

Energy efficiency and portability of oil and gas simulations on multicore and graphics processing unit architectures.
Concurr. Comput. Pract. Exp., 2021

Investigating memory prefetcher performance over parallel applications: From real to simulated.
Concurr. Comput. Pract. Exp., 2021

Offloading the Training of an I/O Access Pattern Detector to the Cloud.
Proceedings of the 33rd International Symposium on Computer Architecture and High Performance Computing, 2021

HPC Data Storage at a Glance: The Santos Dumont Experience.
Proceedings of the 33rd IEEE International Symposium on Computer Architecture and High Performance Computing, 2021

Optimizing Parallel Applications via Dynamic Concurrency Throttling and Turbo Boosting.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Combining Thread Throttling and Mapping to Optimize the EDP of Parallel Applications.
Proceedings of the 29th Euromicro International Conference on Parallel, 2021

Lightweight Deep Learning Applications on AVX-512.
Proceedings of the IEEE Symposium on Computers and Communications, 2021

Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Harnessing Cloud Computing to Power Up HPC Applications: The BRICS CloudHPC Project.
Proceedings of the Computational Science and Its Applications - ICCSA 2021, 2021

Improving Performance of Long Short-Term Memory Networks for Sentiment Analysis Using Multicore and GPU Architectures.
Proceedings of the High Performance Computing - 8th Latin American Conference, 2021

An Analysis of Neural Architecture Search and Hyper Parameter Optimization Methods.
Proceedings of the High Performance Computing - 8th Latin American Conference, 2021

I/O performance of the Santos Dumont supercomputer.
Int. J. High Perform. Comput. Appl., 2020

Adaptive request scheduling for the I/O forwarding layer using reinforcement learning.
Future Gener. Comput. Syst., 2020

Task-based parallel strategies for computational fluid dynamic application in heterogeneous CPU/GPU resources.
Concurr. Comput. Pract. Exp., 2020

Performance and error analysis of recursive edge-aware Gaussian filters on GPUs.
Proceedings of the 33rd SIBGRAPI Conference on Graphics, Patterns and Images, 2020

Towards On-Demand I/O Forwarding in HPC Platforms.
Proceedings of the Fifth IEEE/ACM International Parallel Data Systems Workshop, 2020

Firefly: An Open-source Rocket-based Intermittent Framework.
Proceedings of the 33rd Symposium on Integrated Circuits and Systems Design, 2020

Attesting L-3 General Program Anomaly Detection Efficiency with SPADA.
Proceedings of the IEEE Symposium on Computers and Communications, 2020

Performance and Cost-aware HPC in Clouds: A Network Interconnection Assessment.
Proceedings of the IEEE Symposium on Computers and Communications, 2020

Performance Impact of IEEE 802.3ad in Container-Based Clouds for HPC Applications.
Proceedings of the Computational Science and Its Applications - ICCSA 2020, 2020

The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures.
Proceedings of the Computational Science and Its Applications - ICCSA 2020, 2020

Thermal Neutrons: a Possible Threat for Supercomputers and Safety Critical Applications.
Proceedings of the IEEE European Test Symposium, 2020

An Overview of the Risk Posed by Thermal Neutrons to the Reliability of Computing Devices.
Proceedings of the 50th Annual IEEE-IFIP International Conference on Dependable Systems and Networks, 2020

Accelerating Machine Learning Algorithms with TensorFlow Using Thread Mapping Policies.
Proceedings of the High Performance Computing - 7th Latin American Conference, 2020

EagerMap: A Task Mapping Algorithm to Improve Communication and Load Balancing in Clusters of Multicore Systems.
ACM Trans. Parallel Comput., 2019

Optimization strategies for geophysics models on manycore systems.
Int. J. High Perform. Comput. Appl., 2019

Performance modeling of a geophysics application to accelerate over-decomposition parameter tuning through simulation.
Concurr. Comput. Pract. Exp., 2019

Energy efficiency and I/O performance of low-power architectures.
Concurr. Comput. Pract. Exp., 2019

An Unsupervised Learning Approach for I/O Behavior Characterization.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Managing Power Demand and Load Imbalance to Save Energy on Systems with Heterogeneous CPU Speeds.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Non-uniform Partitioning for Collaborative Execution on Heterogeneous Architectures.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Detecting I/O Access Patterns of HPC Workloads at Runtime.
Proceedings of the 31st International Symposium on Computer Architecture and High Performance Computing, 2019

Memory Performance and Bottlenecks in Multicore and GPU Architectures.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

A Dynamic Task-Based D3Q19 Lattice-Boltzmann Method for Heterogeneous Architectures.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Impact of Workload Distribution on Energy Consumption, Performance, and Reliability of Heterogeneous Devices.
Proceedings of the 27th Euromicro International Conference on Parallel, 2019

Multi-phased Task Placement of HPC Applications in the Cloud.
Proceedings of the 18th International Symposium on Parallel and Distributed Computing, 2019

Minimizing Communication Overheads in Container-based Clouds for HPC Applications.
Proceedings of the 2019 IEEE Symposium on Computers and Communications, 2019

On server-side file access pattern matching.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Boosting HPC Applications in the Cloud Through JIT Traffic-Aware Path Provisioning.
Proceedings of the Computational Science and Its Applications - ICCSA 2019, 2019

Impact of Reduced Precision in the Reliability of Deep Neural Networks for Object Detection.
Proceedings of the 24th IEEE European Test Symposium, 2019

Increasing the Efficiency and Efficacy of Selective-Hardening for Parallel Applications.
Proceedings of the 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2019

Identifying the Most Reliable Collaborative Workload Distribution in Heterogeneous Devices.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2019

Exploring Instance Heterogeneity in Public Cloud Providers for HPC Applications.
Proceedings of the 9th International Conference on Cloud Computing and Services Science, 2019

GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing Environments.
Proceedings of the 9th International Conference on Cloud Computing and Services Science, 2019

SPADA: a statistical program attack detection analysis.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

Optimizing Water Cooling Applications on Shared Memory Systems.
Proceedings of the High Performance Computing - 6th Latin American Conference, 2019

Thread and Data Mapping for Multicore Systems - Improving Communication and Memory Accesses
Springer Briefs in Computer Science, Springer, ISBN: 978-3-319-91073-4, 2018

Energy-Delay-FIT Product to compare processors and algorithm implementations.
Microelectron. Reliab., 2018

A lightweight plug-and-play elasticity service for self-organizing resource provisioning on parallel applications.
Future Gener. Comput. Syst., 2018

MigPF: Towards on self-organizing process rescheduling of Bulk-Synchronous Parallel applications.
Future Gener. Comput. Syst., 2018

A Checkpoint of Research on Parallel I/O for High-Performance Computing.
ACM Comput. Surv., 2018

Optimizing Geophysics Models Using Thread and Data Mapping.
Proceedings of the Symposium on High Performance Computing Systems, 2018

Improving Oil and Gas Simulation Performance Using Thread and Data Mapping.
Proceedings of the High Performance Computing Systems - 19th Symposium, 2018

Application of the SmartLB Load Balancer to Runtime and Power Consumption Reduction of Applications in Parallel Environments.
Proceedings of the Symposium on High Performance Computing Systems, 2018

Improving I/O Performance of RTM Algorithm for Oil and Gas Simulation.
Proceedings of the Symposium on High Performance Computing Systems, 2018

Design, Implementation and Performance Analysis of a CFD Task-Based Application for Heterogeneous CPU/GPU Resources.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Non-uniform Domain Decomposition for Heterogeneous Accelerated Processing Units.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2018, 2018

Predicting the Reliability Behavior of HPC Applications.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Optimizing Machine Learning Algorithms on Multi-Core and Many-Core Architectures Using Thread and Data Mapping.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Improving Communication and Load Balancing with Thread Mapping in Manycore Systems.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Collective I/O Performance on the Santos Dumont Supercomputer.
Proceedings of the 26th Euromicro International Conference on Parallel, 2018

Exploiting Load Imbalance Patterns for Heterogeneous Cloud Computing Platforms.
Proceedings of the 8th International Conference on Cloud Computing and Services Science, 2018

Improving Performance and Energy Efficiency of Geophysics Applications on GPU Architectures.
Proceedings of the High Performance Computing - 5th Latin American Conference, 2018

Performance Evaluation of Stencil Computations Based on Source-to-Source Transformations.
Proceedings of the High Performance Computing - 5th Latin American Conference, 2018

Modeling memory access behavior for data mapping.
Int. J. High Perform. Comput. Appl., 2017

Affinity-Based Thread and Data Mapping in Shared Memory Systems.
ACM Comput. Surv., 2017

CAP Bench: a benchmark suite for performance and energy evaluation of low-power many-core processors.
Concurr. Comput. Pract. Exp., 2017

Performance and energy efficiency analysis of HPC physics simulation applications in a cluster of ARM processors.
Concurr. Comput. Pract. Exp., 2017

Exploiting Price and Performance Tradeoffs in Heterogeneous Clouds.
Proceedings of the Companion Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

A Distributed Stream Processing based Architecture for IoT Smart Grids Monitoring.
Proceedings of the Companion Proceedings of the 10th International Conference on Utility and Cloud Computing, 2017

Experimental and analytical study of Xeon Phi reliability.
Proceedings of the International Conference for High Performance Computing, 2017

Potential Gains in EDP by Dynamically Adapting the Number of Threads for OpenMP Applications in Embedded Systems.
Proceedings of the VII Brazilian Symposium on Computing Systems Engineering, 2017

Strategies to Improve the Performance of a Geophysics Model for Different Manycore Systems.
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017

HPC Application Performance and Cost Efficiency in the Cloud.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

High Performance I/O for Seismic Wave Propagation Simulations.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

TWINS: Server Access Coordination in the I/O Forwarding Layer.
Proceedings of the 25th Euromicro International Conference on Parallel, 2017

Using Power Demand and Residual Load Imbalance in the Load Balancing to Save Energy of Parallel Systems.
Proceedings of the International Conference on Computational Science, 2017

Performance Improvement of Stencil Computations for Multi-core Architectures based on Machine Learning.
Proceedings of the International Conference on Computational Science, 2017

Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators.
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017

Using Simulation to Evaluate and Tune the Performance of Dynamic Load Balancing of an Over-Decomposed Geophysics Application.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Leveraging Cloud Heterogeneity for Cost-Efficient Execution of Parallel Applications.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

Evaluation and Mitigation of Soft-Errors in Neural Network-Based Object Detection in Three GPU Architectures.
Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops, 2017

Kernel vulnerability factor and efficient hardening for histogram of oriented gradients.
Proceedings of the IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2017

CAROL-FI: an Efficient Fault-Injection Tool for Vulnerability Evaluation of Modern HPC Parallel Accelerators.
Proceedings of the Computing Frontiers Conference, 2017

Data mining the memory access stream to detect anomalous application behavior.
Proceedings of the Computing Frontiers Conference, 2017

Optimizing memory affinity with a hybrid compiler/OS approach.
Proceedings of the Computing Frontiers Conference, 2017

Performance Prediction of Acoustic Wave Numerical Kernel on Intel Xeon Phi Processor.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart Grid Application.
Proceedings of the High Performance Computing - 4th Latin American Conference, 2017

Kernel-Based Thread and Data Mapping for Improved Memory Affinity.
IEEE Trans. Parallel Distributed Syst., 2016

Evaluation of Histogram of Oriented Gradients Soft Errors Criticality for Automotive Applications.
ACM Trans. Archit. Code Optim., 2016

Hardware-Assisted Thread and Data Mapping in Hierarchical Multicore Architectures.
ACM Trans. Archit. Code Optim., 2016

A dynamic block-level execution profiler.
Parallel Comput., 2016

LAPT: A locality-aware page table for thread and data mapping.
Parallel Comput., 2016

Seismic wave propagation simulations on low-power and performance-centric manycores.
Parallel Comput., 2016

Automatic I/O scheduling algorithm selection for parallel file systems.
Concurr. Comput. Pract. Exp., 2016

How Programming Languages and Paradigms Affect Performance and Energy in Multithreaded Applications.
Proceedings of the VI Brazilian Symposium on Computing Systems Engineering, 2016

Exploring Cache Size and Core Count Tradeoffs in Systems with Reduced Memory Access Latency.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Analyzing and Improving Memory Access Patterns of Large Irregular Applications on NUMA Machines.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Communication in Shared Memory: Concepts, Definitions, and Efficient Detection.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

Towards Weather Forecasting in the Cloud.
Proceedings of the 24th Euromicro International Conference on Parallel, 2016

System energy analysis for shared memory multiprocessing applications.
Proceedings of the 2016 IEEE International Conference on Electronics, Circuits and Systems, 2016

A Sharing-Aware Memory Management Unit for Online Mapping in Multi-core Architectures.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

enerGyPU and enerGyPhi Monitor for Power Consumption and Performance Evaluation on Nvidia Tesla GPU and Intel Xeon Phi.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Fostering Collaboration in Energy Research and Technological Developments Applying New Exascale HPC Techniques.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Automatic Communication Optimization of Parallel Applications in Public Clouds.
Proceedings of the IEEE/ACM 16th International Symposium on Cluster, 2016

Performance Evaluation of Multiple Cloud Data Centers Allocations for HPC.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

Exploration of Load Balancing Thresholds to Save Energy on Iterative Applications.
Proceedings of the High Performance Computing - Third Latin American Conference, 2016

Characterizing communication and page usage of parallel applications for thread and data mapping.
Perform. Evaluation, 2015

Communication-aware process and thread mapping using online communication detection.
Parallel Comput., 2015

On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms.
J. Parallel Distributed Comput., 2015

Performance/energy trade-off in scientific computing: the case of ARM big.LITTLE and Intel Sandy Bridge.
IET Comput. Digit. Tech., 2015

Communication-aware thread mapping using the translation lookaside buffer.
Concurr. Comput. Pract. Exp., 2015

TABARNAC: visualizing and resolving memory access issues on NUMA architectures.
Proceedings of the 2nd Workshop on Visual Performance Analysis, 2015

Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System.
Proceedings of the 27th International Symposium on Computer Architecture and High Performance Computing, 2015

Characterizing Anomalies of a Multicore ARMv7 Cluster with Parallel N-Body Simulations.
Proceedings of the 2015 International Symposium on Computer Architecture and High Performance Computing Workshops, 2015

Performance impact of operating systems' caching parameters on parallel file systems.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Partial coscheduling of virtual machines based on memory access patterns.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Towards fast profiling of storage devices regarding access sequentiality.
Proceedings of the 30th Annual ACM Symposium on Applied Computing, 2015

Locality vs. Balance: Exploring Data Mapping Policies on NUMA Systems.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

An Efficient Algorithm for Communication-Based Task Mapping.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Challenges and Solutions in Executing Numerical Weather Prediction in a Cloud Infrastructure.
Proceedings of the International Conference on Computational Science, 2015

The Path to Exascale: Code Optimizations and Hardening Solutions Reliability.
Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale, 2015

SiNUCA: A Validated Micro-Architecture Simulator.
Proceedings of the 17th IEEE International Conference on High Performance Computing and Communications, 2015

Understanding GPU errors on large-scale HPC systems and the implications for system design and operation.
Proceedings of the 21st IEEE International Symposium on High Performance Computer Architecture, 2015

Locality and Balance for Communication-Aware Thread Mapping in Multicore Systems.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Porting a Numerical Atmospheric Model to a Cloud Service.
Proceedings of the High Performance Computing - Second Latin American Conference, 2015

Best of SBAC-PAD 2012.
Parallel Comput., 2014

Dynamic thread mapping of shared memory applications by exploiting cache coherence protocols.
J. Parallel Distributed Comput., 2014

A topology-aware load balancing algorithm for clustered hierarchical multi-core machines.
Future Gener. Comput. Syst., 2014

Optimizing Memory Locality Using a Locality-Aware Page Table.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Energy Efficient Seismic Wave Propagation Simulation on a Low-Power Manycore Processor.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Improving the Performance of Seismic Wave Simulations with Dynamic Load Balancing.
Proceedings of the 22nd Euromicro International Conference on Parallel, 2014

Saving energy by exploiting residual imbalances on iterative applications.
Proceedings of the 21st International Conference on High Performance Computing, 2014

Impact of GPUs Parallelism Management on Safety-Critical and HPC Applications Reliability.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

Radiation Sensitivity of High Performance Computing Applications on Kepler-Based GPGPUs.
Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2014

GPGPUs ECC efficiency and efficacy.
Proceedings of the 2014 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, 2014

kMAF: automatic kernel-level management of thread and data affinity.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

Preserving the original MPI semantics in a virtualized processor environment.
Sci. Comput. Program., 2013

Evaluating application performance and energy consumption on hybrid CPU+GPU architecture.
Clust. Comput., 2013

Energy Efficient Last Level Caches via Last Read/Write Prediction.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Communication-Based Mapping Using Shared Pages.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

AGIOS: Application-Guided I/O Scheduling for Parallel File Systems.
Proceedings of the 19th IEEE International Conference on Parallel and Distributed Systems, 2013

Neutron sensitivity and software hardening strategies for matrix multiplication and FFT on graphics processing units.
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale, 2013

A hierarchical aggregation model to achieve visualization scalability in the analysis of parallel applications.
Parallel Comput., 2012

Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms.
Int. J. Netw. Comput., 2012

Atmospheric models hybrid OpenMP/MPI implementation multicore cluster evaluation.
Int. J. Inf. Technol. Commun. Convergence, 2012

Energy Savings via Dead Sub-Block Prediction.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

DIMVHCM: An On-line Distributed Monitoring Data Collection Model.
Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Using the Translation Lookaside Buffer to Map Threads in Parallel Applications Based on Shared Memory.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Evaluating Performance and Energy on ARM-based Clusters for High Performance Computing.
Proceedings of the 41st International Conference on Parallel Processing Workshops, 2012

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems.
Proceedings of the 41st International Conference on Parallel Processing, 2012

Asymptotically Optimal Load Balancing for Hierarchical Multi-Core Systems.
Proceedings of the 18th IEEE International Conference on Parallel and Distributed Systems, 2012

High Performance Computing in the cloud: Deployment, performance and cost efficiency.
Proceedings of the 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings, 2012

Evaluating High Performance Computing on the Windows Azure Platform.
Proceedings of the 2012 IEEE Fifth International Conference on Cloud Computing, 2012

High Latency and Contention on Shared L2-Cache for Many-Core Architectures.
Parallel Process. Lett., 2011

Boosting Parallel Applications Performance on Applying DIM Technique in a Multiprocessing Environment.
Int. J. Reconfigurable Comput., 2011

Challenges and solutions to improve the scalability of an operational regional meteorological forecasting model.
Int. J. High Perform. Syst. Archit., 2011

The impact of applications' I/O strategies on the performance of the Lustre parallel file system.
Int. J. High Perform. Syst. Archit., 2011

Dynamic I/O Reconfiguration for a NFS-Based Parallel File System.
Proceedings of the 19th International Euromicro Conference on Parallel, 2011

Improving Performance on Atmospheric Models through a Hybrid OpenMP/MPI Implementation.
Proceedings of the IEEE International Symposium on Parallel and Distributed Processing with Applications, 2011

Using Memory Access Traces to Map Threads and Data on Hierarchical Multi-core Platforms.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Combining Multiple Metrics to Control BSP Process Rescheduling in Response to Resource and Application Dynamics.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Observing the Impact of Multiple Metrics and Runtime Adaptations on BSP Process Rescheduling.
Parallel Process. Lett., 2010

Triva: Interactive 3D visualization for performance analysis of parallel applications.
Future Gener. Comput. Syst., 2010

Preface to CIESC 2009 Special Issue.
CLEI Electron. J., 2010

Applying Process Migration on a BSP-Based LU Decomposition Application.
Proceedings of the High Performance Computing for Computational Science - VECPAR 2010, 2010

A Comparative Analysis of Load Balancing Algorithms Applied to a Weather Forecast Model.
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

I/O Performance Evaluation on Multicore Clusters with Atmospheric Model Environment.
Proceedings of the 22nd International Symposium on Computer Architecture and High Performance Computing Workshops, 2010

Impact of I/O Coordination on a NFS-Based Parallel File System with Dynamic Reconfiguration.
Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

A new technique for data privatization in user-level threads and its use in parallel applications.
Proceedings of the 2010 ACM Symposium on Applied Computing (SAC), 2010

Challenges and Issues of Supporting Task Parallelism in MPI.
Proceedings of the Recent Advances in the Message Passing Interface, 2010

Parallel Shared-Memory Workloads Performance on Asymmetric Multi-core Architectures.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

Impact of Parallel Workloads on NoC Architecture Design.
Proceedings of the 18th Euromicro Conference on Parallel, 2010

Supporting performance and adaptivity on BSP process rescheduling.
Proceedings of the 15th IEEE Symposium on Computers and Communications, 2010

TLP and ILP exploitation through a reconfigurable multiprocessor system.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Supporting Malleability in Parallel Architectures with Dynamic CPUSETsMapping and Dynamic MPI.
Proceedings of the Distributed Computing and Networking, 11th International Conference, 2010

Evaluating Thread Placement Based on Memory Access Patterns for Multi-core Processors.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Optimizing an MPI weather forecasting model via processor virtualization.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

Parallel Lattice Boltzmann Method with Blocked Partitioning.
Int. J. Parallel Program., 2009

Visual Mapping of Program Components to Resources Representation: A 3D Analysis of Grid Parallel Applications.
Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009

Performance Evaluation of NoC Architectures for Parallel Workloads.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

Design of a Grid workflow for a climate application.
Proceedings of the 14th IEEE Symposium on Computers and Communications (ISCC 2009), 2009

Multi-core aware process mapping and its impact on communication overhead of parallel applications.
Proceedings of the 14th IEEE Symposium on Computers and Communications (ISCC 2009), 2009

Design of Interleaved Multithreading for Network Processors on Chip.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2009), 2009

Applying Processes Rescheduling over Irregular BSP Application.
Proceedings of the Computational Science, 2009

MigBSP: A Novel Migration Model for Bulk-Synchronous Parallel Processes Rescheduling.
Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009

On the design of reconfigurable crossbar switch for adaptable on-chip topologies in programmable NoC routers.
Proceedings of the 19th ACM Great Lakes Symposium on VLSI 2009, 2009

Towards Visualization Scalability through Time Intervals and Hierarchical Organization of Monitoring Data.
Proceedings of the 9th IEEE/ACM International Symposium on Cluster Computing and the Grid, 2009

Controlling Processes Reassignment in BSP Applications.
Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

3D approach to the visualization of parallel applications and Grid monitoring information.
Proceedings of the 9th IEEE/ACM International Conference on Grid Computing (Grid 2008), Tsukuba, Japan, September 29, 2008

NOC architecture design for multi-cluster chips.
Proceedings of the FPL 2008, 2008

ICE: Managing Multiple Clusters Using Web Services.
Proceedings of the 11th IEEE International Conference on Computational Science and Engineering, 2008

A High-Throughput Multi-cluster NoC Architecture.
Proceedings of the 11th IEEE International Conference on Computational Science and Engineering, 2008

Limits for a feasible speculative trace reuse implementation.
Int. J. High Perform. Syst. Archit., 2007

Automatic heart localization in ultrasound fetal images.
Proceedings of the VISAPP 2007: Proceedings of the Second International Conference on Computer Vision Theory and Applications, Barcelona, Spain, March 8-11, 2007, 2007

On-line Scheduling of MPI-2 Programs with Hierarchical Work Stealing.
Proceedings of the 19th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2007), 2007

Evaluating Network-on-Chip for Homogeneous Embedded Multiprocessors in FPGAs.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2007), 2007

The Use of Artificial Neural Networks in the Speech Understanding Model - SUM.
Proceedings of the Artificial Neural Networks, 2007

Processing Mesoscale Climatology in a Grid Environment.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Metaserver Locality and Scalability in a Distributed NFS.
Proceedings of the High Performance Computing for Computational Science, 2006

A Speculative Trace Reuse Architecture with Reduced Hardware Requirements.
Proceedings of the 18th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2006), 2006

Improving the Dynamic Creation of Processes in MPI-2.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

A Model to Computational Speech Understanding.
Proceedings of the Computational Processing of the Portuguese Language, 2006

Scheduling Dynamically Spawned Processes in MPI-2.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006

A Connectionist Approach to Speech Understanding.
Proceedings of the International Joint Conference on Neural Networks, 2006

DIMVisual: Data Integration Model for Visualization of Parallel Programs Behavior.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

ICE: A Service Oriented Approach to Uniform the Access and Management of Cluster Environments.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

Computational Model of Speech Understanding.
Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, 2006

Reusing Traces in a Dynamic Conditional Execution Architecture.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Asynchronous Communication in Java over Infiniband and DECK.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Branch Prediction Topologies for SMT Architectures.
Proceedings of the 17th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2005), 2005

Cluster and network management integration an SNMP-based Solution.
Proceedings of the ICETE 2005, 2005

Evaluating the performance of the dNFSP file system.
Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005

Parallel Computational Model with Dynamic Load Balancing in PC Clusters.
Proceedings of the High Performance Computing for Computational Science, 2004

Value Predictors for Reuse through Speculation on Traces.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Performance Evaluation of a Prototype Distributed NFS Server.
Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

High Performance Cluster Management Based on SNMP: Experiences on Integration Between Network Patterns and Cluster Management Concepts.
Proceedings of the Telecommunications and Networking, 2004

Performance Analysis of DECK Collective Communication Service.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

Complex Branch Profiling for Dynamic Conditional Execution.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

Dynamic Load Balancing in PC Clusters: An Application to a Multi-Physics Model.
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003

Parallelization of Krylov Subspace Methods in Multiprocessor PC Clusters.
Proceedings of the Parallel Computing: Software Technology, 2003

An Oscillatory Neural Network for Image Segmentation.
Proceedings of the Progress in Pattern Recognition, 2003

Echocardiographic Image Sequence Segmentation and Analysis Using Self-Organizing Maps.
J. VLSI Signal Process., 2002

Architecture of Oscillatory Neural Network for Image Segmentation.
Proceedings of the 14th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2002), 2002

Parallelizing Conjugate Gradient Method for Clusters Using MPI and Threads.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

An Evaluation of Simple and Efficient Optimization Techniques for Matrix Muliplication.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Message-passing Over Shared Memory for the SECK Programming Environment.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2002

Improving SMT Performance Scheduling Processes.
Proceedings of the 10th Euromicro Workshop on Parallel, 2002

Segmentation of TEM Images Using Oscillatory Neural Networks.
Proceedings of the 14th Brazilian Symposium on Computer Graphics and Image Processing (SIBGRAPI 2001), 2001

Evaluating the Effects of Branch Prediction Accuracy on the Performance of SMT Architectures.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

DECK-SCI: High-Performance Communication and Multithreading for SCI Clusters.
Proceedings of the 2001 IEEE International Conference on Cluster Computing (CLUSTER 2001), 2001

Fetal Left Atrium Segmentation using Kohonen Maps to Measure the Septum Primum Redundancy Index.
Proceedings of the 6th Brazilian Symposium on Neural Networks (SBRN 2000), 2000

DPC++: Object-Oriented Programming Applied to Cluster Computing.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

Distributed Processor Allocation in Mesh-Connected Multicomputers.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

A Selection Mechanism to Group Processes in a Parallel Debugger.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 2000

The MultiCluster Model to the Integrated Use of Multiple Workstation Clusters.
Proceedings of the Parallel and Distributed Processing, 2000

Distributed Processor Allocation in Large PC Clusters.
Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing, 2000

Distributed Processor Allocation in Multicomputers.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

Analysing a Multistreamed Superscalar Speculative Fetch Mechanism.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

High performance with high accuracy laboratory.
RITA, 1996

Performance evaluation in image processing with GAPP array processor.
Microprocess. Microprogramming, 1995

Flexible Kernel: The AURORA Approach for Multiprocessor Operating System.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1995

Os processos de compilação e execução em AURORA.
Proceedings of the 7th Brazilian Symposium on Software Engineering, 1993

SARA: A processor interconnection performance analysis tool.
Microprocess. Microprogramming, 1988

SSIP - A Processor Interconnection Simulator.
Proceedings of the Parallel and Large-Scale Computers: Performance, 1982

Data Base Processor MAGE.
Proceedings of the Papers of the Fifth Workshop on Computer Architecture for Non-Numeric Processing, 1980

Processeur base de données MAGE : aspect matériel.
PhD thesis, 1979
