María Jesús Garzarán

  • Intel Corporation, USA

According to our database1, María Jesús Garzarán authored at least 70 papers between 1998 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 





Modeling and Simulation of Collective Algorithms on HPC Network Topologies using Structural Simulation Toolkit.
Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Enabling Multi-level Network Modeling in Structural Simulation Toolkit for Next-Generation HPC Network Design Space Exploration.
Proceedings of the High Performance Computing, 2023

Minimizing the usage of hardware counters for collective communication using triggered operations.
Parallel Comput., 2020

Efficient implementation of MPI-3 RMA over openFabrics interfaces.
Parallel Comput., 2019

Software combining to mitigate multithreaded MPI contention.
Proceedings of the ACM International Conference on Supercomputing, 2019

NoMap: Speeding-Up JavaScript Using Hardware Transactional Memory.
Proceedings of the 25th IEEE International Symposium on High Performance Computer Architecture, 2019

Exploiting social network graph characteristics for efficient BFS on heterogeneous chips.
J. Parallel Distributed Comput., 2018

Framework for scalable intra-node collective operations using shared memory.
Proceedings of the International Conference for High Performance Computing, 2018

Parallelizing MPI Using Tasks for Hybrid Programming Models.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

OpenMP<sup>®</sup> Runtime Instrumentation for Optimization.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

ShortCut: Architectural Support for Fast Object Access in Scripting Languages.
Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017

Mapping Streaming Applications on Commodity Multi-CPU and GPU On-Chip Processors.
IEEE Trans. Parallel Distributed Syst., 2016

Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication.
ACM Trans. Archit. Code Optim., 2016

Breadth-First Search on Heterogeneous Platforms: A Case of Study on Social Networks.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

DSMR: a shared and distributed memory algorithm for single-source shortest path problem.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

DSMR: A Parallel Algorithm for Single-Source Shortest Path Problem.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Proceedings of the Workshop on High Performance Energy Efficient Embedded Systems (HIP3ES) 2015.
CoRR, 2015

Reducing overheads of dynamic scheduling on heterogeneous chips.
CoRR, 2015

Parallel Pipeline on Heterogeneous Multi-processing Architectures.
Proceedings of the 2015 IEEE TrustCom/BigDataSE/ISPA, 2015

Pipeline Template for Streaming Applications on Heterogeneous Chips.
Proceedings of the Parallel Computing: On the Road to Exascale, 2015

Adaptive Partitioning for Irregular Applications on Heterogeneous CPU-GPU Chips.
Proceedings of the International Conference on Computational Science, 2015

Understanding the Propagation of Error Due to a Silent Data Corruption in a Sparse Matrix Vector Multiply.
Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Optimal Parallelogram Selection for Hierarchical Tiling.
ACM Trans. Archit. Code Optim., 2014

Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction.
Proceedings of the International Conference for High Performance Computing, 2014

Evaluation of a Feature Tracking Vision Application on a Heterogeneous Chip.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Improving JavaScript performance by deconstructing the type system.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Directive-Based Compilers for GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

Optimization by runtime specialization for sparse matrix-vector multiplication.
Proceedings of the Generative Programming: Concepts and Experiences, 2014

Easy, fast, and energy-efficient object detection on heterogeneous on-chip architectures.
ACM Trans. Archit. Code Optim., 2013

Optimization techniques for efficient HTA programs.
Parallel Comput., 2012

Performance Portability with the Chapel Language.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Hierarchical overlapped tiling.
Proceedings of the 10th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2012

Scheduling of stream-based real-time applications for heterogeneous systems.
Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, 2011

An Evaluation of Vectorizing Compilers.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Parallel Numerical Solver Using Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

ESoftCheck: Removal of Non-vital Checks for Fault Tolerance.
Proceedings of the CGO 2009, 2009

Optimization of tele-immersion codes.
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, 2009

Design Issues in Parallel Array Languages for Shared Memory.
Proceedings of the Embedded Computer Systems: Architectures, 2008

Programming with tiles.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

P-Ray: A Software Suite for Multi-core Architecture Characterization.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Efficient software checking for fault tolerance.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Automatic generation of a parallel sorting algorithm.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Techniques for Efficient Software Checking.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Optimizing Sorting with Machine Learning Algorithms.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

07361 Abstracts Collection -- Programming Models for Ubiquitous Parallelism.
Proceedings of the Programming Models for Ubiquitous Parallelism, 02.09. - 07.09.2007, 2007

07361 Introduction -- Programming Models for Ubiquitous Parallelism.
Proceedings of the Programming Models for Ubiquitous Parallelism, 02.09. - 07.09.2007, 2007

Compiler Optimizations for Fault Tolerance Software Checking.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

In search of a program generator to implement generic transformations for high-performance computing.
Sci. Comput. Program., 2006

Programming for parallelism and locality with hierarchically tiled arrays.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Design and Use of htalib - A Library for Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Hierarchically tiled arrays for parallelism and locality.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors.
ACM Trans. Archit. Code Optim., 2005

Is Search Really Necessary to Generate High-Performance BLAS?
Proc. IEEE, 2005

Optimizing Matrix Multiplication with a Classifier Learning System.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Analytic Models and Empirical Search: A Hybrid Approach to Code Optimization.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

A Language for the Compact Representation of Multiple Program Versions.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Optimizing Sorting with Genetic Algorithms.
Proceedings of the 3nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2005), 2005

The Hierarchically Tiled Arrays programming approach.
Proceedings of the 7th Workshop on languages, 2004

Implementation of Parallel Numerical Algorithms Using Hierarchically Tiled Arrays.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

A Dynamically Tuned Sorting Library.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

A comparison of empirical and model-driven optimization.
Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation 2003, 2003

The Power of Belady?s Algorithm in Register Allocation for Long Basic Blocks.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Using Software Logging to Support Multi-Version Buffering in Thread-Level Speculation.
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Hardware Prefetching in Bus-Based Multiprocessors: Pattern Characterization and Cost-Effective Hardware.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

Removing architectural bottlenecks to the scalability of speculative parallelization.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Architectural Support for Parallel Reductions in Scalable Shared-Memory Multiprocessors.
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

Characterization and Improvement of Load/Store Cache-based Prefetching.
Proceedings of the 12th international conference on Supercomputing, 1998
