Xavier Martorell
Orcid: 0000-0002-0417-3430
According to our database1,
Xavier Martorell
authored at least 196 papers
between 1995 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on zbmath.org
-
on orcid.org
On csauthors.net:
Bibliography
2024
IEEE Trans. Computers, May, 2024
IEEE Trans. Computers, January, 2024
Future Gener. Comput. Syst., 2024
CoRR, 2024
Proceedings of the 16th Workshop on Rapid Simulation and Performance Evaluation for Design, 2024
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024
Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
Proceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2024
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning : (Practical Experience Report).
Proceedings of the 19th European Dependable Computing Conference, 2024
Proceedings of the 27th Euromicro Conference on Digital System Design, 2024
2023
Proceedings of the International Conference on Field Programmable Technology, 2023
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023
Proceedings of the 38th Conference on Design of Circuits and Integrated Systems, 2023
2022
Towards EXtreme scale technologies and accelerators for euROhpc hw/Sw supercomputing applications for exascale: The TEXTAROSSA approach.
Microprocess. Microsystems, November, 2022
Analyzing the performance of hierarchical collective algorithms on ARM-based multicore clusters.
Proceedings of the 30th Euromicro International Conference on Parallel, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022
2021
IEEE Trans. Computers, 2021
An FPGA cached sparse matrix vector product (SpMV) for unstructured computational fluid dynamics simulations.
CoRR, 2021
Proceedings of the Euro-Par 2021: Parallel Processing, 2021
TEXTAROSSA: Towards EXtreme scale Technologies and Accelerators for euROhpc hw/Sw Supercomputing Applications for exascale.
Proceedings of the 24th Euromicro Conference on Digital System Design, 2021
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2021
2020
Parallel Comput., 2020
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library).
J. Parallel Distributed Comput., 2020
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020
Proceedings of the 2020 Design, Automation & Test in Europe Conference & Exhibition, 2020
2019
J. Parallel Distributed Comput., 2019
CoRR, 2019
Proceedings of the High Performance Computing, 2019
Proceedings of the 27th Euromicro International Conference on Parallel, 2019
Proceedings of the 20th International Conference on Parallel and Distributed Computing, 2019
A Methodology Approach to Compare Performance of Parallel Programming Models for Shared-Memory Architectures.
Proceedings of the Numerical Computations: Theory and Algorithms, 2019
2018
Performance and energy effects on task-based parallelized applications - User-directed versus manual vectorization.
J. Supercomput., 2018
cuThomasBatch and cuThomasVBatch, CUDA Routines to compute batch of tridiagonal systems on NVIDIA GPUs.
Concurr. Comput. Pract. Exp., 2018
Concurr. Comput. Pract. Exp., 2018
Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms.
Comput. J., 2018
Proceedings of the 18th International Conference on Embedded Computer Systems: Architectures, 2018
MPI+OpenMP Tasking Scalability for the Simulation of the Human Brain: Human Brain Project.
Proceedings of the 25th European MPI Users' Group Meeting, 2018
Proceedings of the 26th Euromicro International Conference on Parallel, 2018
Proceedings of the 26th Euromicro International Conference on Parallel, 2018
Proceedings of the International Conference on Field-Programmable Technology, 2018
LEGaTO: towards energy-efficient, secure, fault-tolerant toolset for heterogeneous computing.
Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018
Proceedings of the Reliable Software Technologies - Ada-Europe 2018, 2018
2017
ACM Trans. Comput. Syst., 2017
Microprocess. Microsystems, 2017
Proceedings of the 2017 International Symposium on Computer Architecture and High Performance Computing Workshops, 2017
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017
NVIDIA GPUs Scalability to Solve Multiple (Batch) Tridiagonal Systems Implementation of cuThomasBatch.
Proceedings of the Parallel Processing and Applied Mathematics, 2017
Implementation of the K-Means Algorithm on Heterogeneous Devices: A Use Case Based on an Industrial Dataset.
Proceedings of the Parallel Computing is Everywhere, 2017
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017
Characterizing and Improving the Performance of Many-Core Task-Based Parallel Programming Runtimes.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017
cuHinesBatch: Solving Multiple Hines systems on GPUs Human Brain Project<sup>*</sup>.
Proceedings of the International Conference on Computational Science, 2017
Proceedings of the 2017 IEEE International Symposium on High Performance Computer Architecture, 2017
Proceedings of the Reliable Software Technologies - Ada-Europe 2017, 2017
Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, 2017
2016
CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters.
IEEE Trans. Parallel Distributed Syst., 2016
IEEE Trans. Parallel Distributed Syst., 2016
ACM Trans. Parallel Comput., 2016
Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs.
Parallel Comput., 2016
Proceedings of the International Conference for High Performance Computing, 2016
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016
Proceedings of the 24th Euromicro International Conference on Parallel, 2016
The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the Seventh International Green and Sustainable Computing Conference, 2016
Proceedings of the 2016 Euromicro Conference on Digital System Design, 2016
Proceedings of the 21st Asia and South Pacific Design Automation Conference, 2016
2015
Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories.
IEEE Trans. Computers, 2015
Supercomput. Front. Innov., 2015
Coarse-Grain Performance Estimator for Heterogeneous Parallel Computing Architectures like Zynq All-Programmable SoC.
CoRR, 2015
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015
Proceedings of the Parallel Processing and Applied Mathematics, 2015
Evaluating the Performance Impact of Communication Imbalance in Sparse Matrix-Vector Multiplication.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015
Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015
In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance.
Proceedings of the 2015 International Conference on High Performance Computing & Simulation, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 44th International Conference on Parallel Processing Workshops, 2015
Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms.
Proceedings of the 44th International Conference on Parallel Processing, 2015
Boosting irregular array Reductions through In-lined Block-ordering on fast processors.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015
Proceedings of the 2015 Euromicro Conference on Digital System Design, 2015
Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015
2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014
Analyzing the impact of programming models for efficient communication overlap in high-speed networks.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014
2013
A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013
Proceedings of the 21st IEEE/IFIP International Conference on VLSI and System-on-Chip, 2013
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013
Implementing OmpSs support for regions of data in architectures with multiple address spaces.
Proceedings of the International Conference on Supercomputing, 2013
Improving performance of all-to-all communication through loop scheduling in PGAS environments.
Proceedings of the International Conference on Supercomputing, 2013
Proceedings of the International Conference on Supercomputing, 2013
2012
Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Gener. Comput. Syst., 2012
POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012
Proceedings of the Languages and Compilers for Parallel Computing, 2012
Proceedings of the OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the 41st International Conference on Parallel Processing, 2012
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012
DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012
Proceedings of the Center for Advanced Studies on Collaborative Research, 2012
Proceedings of the 2012 NASA/ESA Conference on Adaptive Hardware and Systems, 2012
2011
Parallel Process. Lett., 2011
Int. J. Parallel Program., 2011
Comput. J., 2011
Proceedings of the first workshop on Irregular applications: architectures and algorithm, 2011
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011
Proceedings of the IEEE International Conference on Computer Vision Workshops, 2011
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011
2010
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distributed Syst., 2010
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010
Proceedings of the Languages and Compilers for Parallel Computing, 2010
Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010
Proceedings of the High Performance Embedded Architectures and Compilers, 2010
Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010
Proceedings of the 2010 conference of the Centre for Advanced Studies on Collaborative Research, 2010
2009
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009
Achieving high memory performance from heterogeneous architectures with the SARC programming model.
Proceedings of the 10th workshop on MEmory performance, 2009
Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009
Proceedings of the Languages and Compilers for Parallel Computing, 2009
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP.
Proceedings of the ICPP 2009, 2009
Proceedings of the 2009 conference of the Centre for Advanced Studies on Collaborative Research, 2009
2008
Proceedings of the 9th workshop on MEmory performance, 2008
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008
Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative Research, 2008
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008
2007
Trans. High Perform. Embed. Archit. Compil., 2007
Proceedings of the Embedded Computer Systems: Architectures, 2007
Proceedings of the Languages and Compilers for Parallel Computing, 2007
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007
Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007
Proceedings of the 2007 conference of the Centre for Advanced Studies on Collaborative Research, 2007
2006
J. Parallel Distributed Comput., 2006
Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications.
J. Parallel Distributed Comput., 2006
Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture.
J. Embed. Comput., 2006
Proceedings of the Languages and Compilers for Parallel Computing, 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
2005
IEEE Trans. Parallel Distributed Syst., 2005
Design and implementation of message-passing services for the Blue Gene/L supercomputer.
IBM J. Res. Dev., 2005
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 19th Annual International Conference on Supercomputing, 2005
2004
Page Migration with Dynamic Space-Sharing Scheduling Policies: The Case of the SGI O2000.
Int. J. Parallel Program., 2004
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2004
Proceedings of the Euro-Par 2004 Parallel Processing, 2004
2003
Parallel Process. Lett., 2003
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003
Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2003), 2003
MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface,10th European PVM/MPI Users' Group Meeting, Venice, Italy, September 29, 2003
Application/Kernel Cooperation Towards the Efficient Execution of Shared-Memory Parallel Java Codes.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Evaluation of the memory page migration influence in the system performance: the case of the SGI O2000.
Proceedings of the 17th Annual International Conference on Supercomputing, 2003
Proceedings of the Euro-Par 2003. Parallel Processing, 2003
2002
Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002
2001
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001
Proceedings of the 15th international conference on Supercomputing, 2001
Proceedings of the 2001 International Conference on Parallel Processing, 2001
2000
Concurr. Pract. Exp., 2000
Proceedings of the Languages and Compilers for Parallel Computing, 2000
Proceedings of the Job Scheduling Strategies for Parallel Processing, IPDPS 2000 Workshop, 2000
Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000
1999
PhD thesis, 1999
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999
Proceedings of the International Conference on Parallel Processing 1999, 1999
1998
Experiences on implementing PARMACS macros to run the SPLASH-2 suite on multiprocessors.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998
Proceedings of the 12th international conference on Supercomputing, 1998
1997
Proceedings of the Languages and Compilers for Parallel Computing, 1997
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997
1996
Proceedings of the Euro-Par '96 Parallel Processing, 1996
1995
The eXc Model: Scheduler-Activations on Mach 3.0.
Proceedings of the Seventh IASTED/ISMM International Conference on Parallel and Distributed Computing and Systems, 1995