Eduard Ayguadé

Orcid: 0000-0002-5146-103X

Affiliations:
  • Polytechnic University of Catalonia, Barcelona, Spain


According to our database1, Eduard Ayguadé authored at least 410 papers between 1989 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
$\mathcal{O}(n)$O(n) Key-Value Sort With Active Compute Memory.
IEEE Trans. Computers, May, 2024

DRAM Errors and Cosmic Rays: Space Invaders or Science Fiction?
CoRR, 2024

A Mess of Memory System Benchmarking, Simulation and Application Profiling.
CoRR, 2024

Aloe: A Family of Fine-tuned Open Healthcare LLMs.
CoRR, 2024


Reinforcement Learning-based Adaptive Mitigation of Uncorrected DRAM Errors in the Field.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

2023
Mitigating the NUMA effect on task-based runtime systems.
J. Supercomput., September, 2023

Assessing Saiph, a task-based DSL for high-performance computational fluid dynamics.
Future Gener. Comput. Syst., 2023

Accelerating SpMV on FPGAs Through Block-Row Compress: A Task-Based Approach.
Proceedings of the 33rd International Conference on Field-Programmable Logic and Applications, 2023

b8c: SpMV accelerator implementation leveraging high memory bandwidth.
Proceedings of the 31st IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2023

2022
Automated generation of High-Performance Computational Fluid Dynamics Codes.
J. Comput. Sci., 2022

The MAMe dataset: on the relevance of high resolution and variable shape image properties.
Appl. Intell., 2022

Automatic aggregation of subtask accesses for nested OpenMP-style tasks.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

OmpSs@cloudFPGA: An FPGA Task-Based Programming Model with Message Passing.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Transparent load balancing of MPI programs using [email protected] and DLB.
Proceedings of the 51st International Conference on Parallel Processing, 2022

OmpSs-2@Cluster: Distributed Memory Execution of Nested OpenMP-style Tasks.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022


ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Implementation of a high-accuracy phase unwrapping algorithm using parallel-hybrid programming approach for displacement sensing using self-mixing interferometry.
J. Supercomput., 2021

OmpSs@FPGA Framework for High Performance FPGA Computing.
IEEE Trans. Computers, 2021

Size & Shape Matters: The Need of HPC Benchmarks of High Resolution Image Training for Deep Learning.
Supercomput. Front. Innov., 2021

The AXIOM Project: IoT on Heterogeneous Embedded Platforms.
IEEE Des. Test, 2021

Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-based Programming Models.
Proceedings of the ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9, 2021

Task-Based Programming Models for Heterogeneous Recurrent Workloads.
Proceedings of the Applied Reconfigurable Computing. Architectures, Tools, and Applications, 2021

2020
An Intelligent Iris Based Chronic Kidney Identification System.
Symmetry, 2020

Asynchronous runtime with distributed manager for task-based programming models.
Parallel Comput., 2020

Extending the OpenCHK Model with advanced checkpoint features.
Future Gener. Comput. Syst., 2020

Generating Efficient DNN-Ensembles with Evolutionary Computation.
CoRR, 2020

A Closer Look at Art Mediums: The MAMe Image Classification Dataset.
CoRR, 2020

GOPHER, an HPC Framework for Large Scale Graph Exploration and Inference.
Proceedings of the High Performance Computing, 2020

Cost-aware prediction of uncorrected DRAM errors in the field.
Proceedings of the International Conference for High Performance Computing, 2020

Breaking master-slave model between host and FPGAs.
Proceedings of the PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

Enhancing Resource Management Through Prediction-Based Policies.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

A Toolchain to Verify the Parallelization of OmpSs-2 Applications.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

Evaluating Worksharing Tasks on Distributed Environments.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019
A Hardware Runtime for Task-Based Programming Models.
IEEE Trans. Parallel Distributed Syst., 2019

The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors.
Trans. High Perform. Embed. Archit. Compil., 2019

Sampled Simulation of Task-Based Programs.
IEEE Trans. Computers, 2019

Studying the impact of the Full-Network embedding on multimodal pipelines.
Semantic Web, 2019

PROFET: Modeling System Performance and Energy Without Simulating the CPU.
Proc. ACM Meas. Anal. Comput. Syst., 2019

Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems.
J. Parallel Distributed Comput., 2019

Trends on heterogeneous and innovative hardware and software systems.
J. Parallel Distributed Comput., 2019

On the maturity of parallel applications for asymmetric multi-core processors.
J. Parallel Distributed Comput., 2019

Resource-aware Elastic Swap Random Forest for Evolving Data Streams.
CoRR, 2019

Assembling a High-Productivity DSL for Computational Fluid Dynamics.
Proceedings of the Platform for Advanced Scientific Computing Conference, 2019

DRAM errors in the field: a statistical approach.
Proceedings of the International Symposium on Memory Systems, 2019

Worksharing Tasks: An Efficient Way to Exploit Irregular and Fine-Grained Loop Parallelism.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Random Forest as a Tumour Genetic Marker Extractor.
Proceedings of the Artificial Intelligence Research and Development, 2019

Feature Discriminativity Estimation in CNNs for Transfer Learning.
Proceedings of the Artificial Intelligence Research and Development, 2019

2018
Memory Controller for Vector Processor.
J. Signal Process. Syst., 2018

Asynchronous and Exact Forward Recovery for Detected Errors in Iterative Solvers.
IEEE Trans. Parallel Distributed Syst., 2018

Automated curation of brand-related social media images with deep learning.
Multim. Tools Appl., 2018

EMVS: Embedded Multi Vector-core System.
J. Syst. Archit., 2018

An approach to task-based parallel programming for undergraduate students.
J. Parallel Distributed Comput., 2018

On the Behavior of Convolutional Nets for Feature Extraction.
J. Artif. Intell. Res., 2018

Low-Precision Floating-Point Schemes for Neural Network Training.
CoRR, 2018

Formalization of Block Pruning: Reducing the Number of Cells Computed in Exact Biological Sequence Comparison Algorithms.
Comput. J., 2018

Peachy Parallel Assignments (EduHPC 2018).
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

Teaching HPC Systems and Parallel Programming with Small-Scale Clusters.
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

Mainstream vs. Emerging HPC: Metrics, Trade-Offs and Lessons Learned.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Graph partitioning applied to DAG scheduling to reduce NUMA effects.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Main memory latency simulation: the missing link.
Proceedings of the International Symposium on Memory Systems, 2018

Reducing Data Movement on Large Shared Memory Systems by Exploiting Computation Dependencies.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Runtime-Guided Management of Stacked DRAM Memories in Task Parallel Programs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

An Out-of-the-box Full-Network Embedding for Convolutional Neural Networks.
Proceedings of the 2018 IEEE International Conference on Big Knowledge, 2018

Application Acceleration on FPGAs with OmpSs@FPGA.
Proceedings of the International Conference on Field-Programmable Technology, 2018

HPC Benchmarking: Scaling Right and Looking Beyond the Average.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Saiph: Towards a DSL for High-Performance Computational Fluid Dynamics.
Proceedings of the Real World Domain Specific Languages Workshop, 2018

A Visual Distance for WordNet.
Proceedings of the Artificial Intelligence Research and Development, 2018

2017
Task Scheduling Techniques for Asymmetric Multi-Core Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Main Memory in HPC: Do We Need More or Could We Live with Less?
ACM Trans. Archit. Code Optim., 2017

Workflows for Science: a Challenge when Facing the Convergence of HPC and Big Data.
Supercomput. Front. Innov., 2017

The AXIOM platform for next-generation cyber physical systems.
Microprocess. Microsystems, 2017

Full-Network Embedding in a Multimodal Embedding Pipeline.
CoRR, 2017

Fluid Communities: A Community Detection Algorithm.
CoRR, 2017

An Out-of-the-box Full-network Embedding for Convolutional Neural Networks.
CoRR, 2017

Identifying the potential of Near Data Computing for Apache Spark.
CoRR, 2017

A visual embedding for the unsupervised extraction of abstract semantics.
Cogn. Syst. Res., 2017

Full-Network Embedding in a Multimodal Embedding Pipeline.
Proceedings of the 2nd Workshop on Semantic Deep Learning, 2017

Building Graph Representations of Deep Vector Embeddings.
Proceedings of the 2nd Workshop on Semantic Deep Learning, 2017

Extending OmpSs for OpenCL Kernel Co-Execution in Heterogeneous Systems.
Proceedings of the 29th International Symposium on Computer Architecture and High Performance Computing, 2017

Efficient exception handling support for GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Identifying the potential of near data processing for apache spark.
Proceedings of the International Symposium on Memory Systems, 2017

Adaptive and Architecture-Independent Task Granularity for Recursive Applications.
Proceedings of the Scaling OpenMP for Exascale Performance and Portability, 2017

General Purpose Task-Dependence Management Hardware for Task-Based Dataflow Programming Models.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Improving the Integration of Task Nesting and Dependencies in OpenMP.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Characterizing and Improving the Performance of Many-Core Task-Based Parallel Programming Runtimes.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

A Directive-Based Approach to Perform Persistent Checkpoint/Restart.
Proceedings of the 2017 International Conference on High Performance Computing & Simulation, 2017

Exploiting Key-Value Data Stores Scalability for HPC.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

Efficient Data Sharing on Heterogeneous Systems.
Proceedings of the 46th International Conference on Parallel Processing, 2017

ParaView + Alya + D8tree: Integrating High Performance Computing and High Performance Data Analytics.
Proceedings of the International Conference on Computational Science, 2017

Fluid Communities: A Competitive, Scalable and Diverse Community Detection Algorithm.
Proceedings of the Complex Networks & Their Applications VI, 2017

Low-latency multi-threaded ensemble learning for dynamic big data streams.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
CUDAlign 4.0: Incremental Speculative Traceback for Exact Chromosome-Wide Alignment in GPU Clusters.
IEEE Trans. Parallel Distributed Syst., 2016

MASA: A Multiplatform Architecture for Sequence Aligners with Block Pruning.
ACM Trans. Parallel Comput., 2016

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite.
ACM Trans. Archit. Code Optim., 2016

The AXIOM software layers.
Microprocess. Microsystems, 2016

Hierarchical Hyperlink Prediction for the WWW.
CoRR, 2016

Limitations and Alternatives for the Evaluation of Large-scale Link Prediction.
CoRR, 2016

Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study.
CoRR, 2016


MUSA: a multi-level simulation approach for next-generation HPC machines.
Proceedings of the International Conference for High Performance Computing, 2016

Large-Memory Nodes for Energy Efficient High-Performance Computing.
Proceedings of the Second International Symposium on Memory Systems, 2016

Multiple Target Task Sharing Support for the OpenMP Accelerator Model.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Supporting Adaptive Privatization Techniques for Irregular Array Reductions in Task-Parallel Programming Models.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Performance analysis of a hardware accelerator of dependence management for task-based dataflow programming models.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

TaskPoint: Sampled simulation of task-based programs.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

CATA: Criticality Aware Task Acceleration for Multicore Processors.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes.
Proceedings of the 2016 International Conference on Supercomputing, 2016

D8-tree: a de-normalized approach for multidimensional data analysis on key-value databases.
Proceedings of the 17th International Conference on Distributed Computing and Networking, 2016


On the Representativeness of Convolutional Neural Networks Layers.
Proceedings of the Artificial Intelligence Research and Development, 2016

User-generated content curation with deep convolutional neural networks.
Proceedings of the 2016 IEEE International Conference on Big Data (IEEE BigData 2016), 2016

Micro-Architectural Characterization of Apache Spark on Batch and Stream Processing Workloads.
Proceedings of the 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), 2016

Node architecture implications for in-memory data analytics on scale-in clusters.
Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, 2016

Echo State Hoeffding Tree Learning.
Proceedings of The 8th Asian Conference on Machine Learning, 2016

POSTER: Collective Dynamic Parallelism for Directive Based GPU Programming Languages and Compilers.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

POSTER: Exploiting Asymmetric Multi-Core Processors with Flexible System Sofware.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Hardware-Software Coherence Protocol for the Coexistence of Caches and Local Memories.
IEEE Trans. Computers, 2015

AMC: Advanced Multi-accelerator Controller.
Parallel Comput., 2015

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models.
Parallel Comput., 2015

Extracting Visual Patterns from Deep Learning Representations.
CoRR, 2015

Tareador: a tool to unveil parallelization strategies at undergraduate level.
Proceedings of the Workshop on Computer Architecture Education, 2015

SSMART: smart scheduling of multi-architecture tasks on heterogeneous systems.
Proceedings of the Second Workshop on Accelerator Programming using Directives, 2015

Exploring dynamic parallelism in OpenMP.
Proceedings of the Second Workshop on Accelerator Programming using Directives, 2015

Exploiting asynchrony from exact forward recovery for DUE in iterative solvers.
Proceedings of the International Conference for High Performance Computing, 2015

The AXIOM project (Agile, eXtensible, fast I/O Module).
Proceedings of the 2015 International Conference on Embedded Computer Systems: Architectures, 2015

Experiences of Using Cassandra for Molecular Dynamics Simulations.
Proceedings of the 23rd Euromicro International Conference on Parallel, 2015

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
Proceedings of the 2015 International Symposium on Memory Systems, 2015

Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Towards Task-Parallel Reductions in OpenMP.
Proceedings of the OpenMP: Heterogenous Execution and Data Movements, 2015

Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures.
Proceedings of the 42nd Annual International Symposium on Computer Architecture, 2015

Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Self-Tuned Software-Managed Energy Reduction in InfiniBand Links.
Proceedings of the 21st IEEE International Conference on Parallel and Distributed Systems, 2015

AMA: Asynchronous Management of Accelerators for Task-based Programming Models.
Proceedings of the International Conference on Computational Science, 2015

Automatic Query Driven Data Modelling in Cassandra.
Proceedings of the International Conference on Computational Science, 2015

Auto-Tuning OmpSs-OpenCL Kernels Across GPU Machines.
Proceedings of the 6th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and the 4th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, 2015



Evaluating Link Prediction on Large Graphs.
Proceedings of the Artificial Intelligence Research and Development, 2015

How Data Volume Affects Spark Based Data Analytics on a Scale-up Server.
Proceedings of the Big Data Benchmarks, Performance Optimization, and Emerging Hardware, 2015

ViPS: Visual processing system for medical imaging.
Proceedings of the 8th International Conference on Biomedical Engineering and Informatics, 2015

Multimedia Big Data Computing for In-Depth Event Analysis.
Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, BigMM 2015, 2015

Spark deployment and performance evaluation on the MareNostrum supercomputer.
Proceedings of the 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29, 2015

Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server.
Proceedings of the Fifth IEEE International Conference on Big Data and Cloud Computing, 2015

Runtime-Guided Management of Scratchpad Memories in Multicore Architectures.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Runtime-Aware Architectures: A First Approach.
Supercomput. Front. Innov., 2014

PMSS: A programmable memory system and scheduler for complex memory patterns.
J. Parallel Distributed Comput., 2014

A methodology for the evaluation of high response time on E-commerce users and sales.
Inf. Syst. Frontiers, 2014

Automatic Exploration of Potential Parallelism in Sequential Applications.
Proceedings of the Supercomputing - 29th International Conference, 2014

Scalability and Parallel Execution of OmpSs-OpenCL Tasks on Heterogeneous CPU-GPU Environment.
Proceedings of the Supercomputing - 29th International Conference, 2014

A data flow language to develop high performance computing DSLs.
Proceedings of the Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2014

Leveraging OmpSs to Exploit Hardware Accelerators.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

A Case Study of Hybrid Dataflow and Shared-Memory Programming Models: Dependency-Based Parallel Game Engine.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Analyzing Performance Improvements and Energy Savings in Infiniband Architecture using Network Compression.
Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

PAMS: Pattern Aware Memory System for embedded systems.
Proceedings of the 2014 International Conference on ReConFigurable Computing and FPGAs, 2014

Fine-grain parallel megabase sequence comparison with multiple heterogeneous GPUs.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Towards the Cloudification of the Social Networks Analytics.
Proceedings of the Modeling Decisions for Artificial Intelligence, 2014

Towards Transactional Memory for OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

On the Roles of the Programmer, the Compiler and the Runtime System When Programming Accelerators in OpenMP.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Task-Parallel Reductions in OpenMP and OmpSs.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

Software-Managed Power Reduction in Infiniband Links.
Proceedings of the 43rd International Conference on Parallel Processing, 2014

Advanced Pattern based Memory Controller for FPGA based HPC applications.
Proceedings of the International Conference on High Performance Computing & Simulation, 2014

AMMC: Advanced Multi-Core Memory Controller.
Proceedings of the 2014 International Conference on Field-Programmable Technology, 2014

MAPC: Memory access pattern based controller.
Proceedings of the 24th International Conference on Field Programmable Logic and Applications, 2014

APMC: advanced pattern based memory controller (abstract only).
Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2014

Task-Based Programming with OmpSs and Its Application.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

Profit-aware cloud resource provisioner for ecommerce.
Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

DaSH: a benchmark suite for hybrid dataflow and shared memory programming models: with comparative evaluation of three hybrid dataflow models.
Proceedings of the Computing Frontiers Conference, CF'14, 2014

CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

Adaptive MapReduce Scheduling in Shared Environments.
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness.
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014

PVMC: Programmable Vector Memory Controller.
Proceedings of the IEEE 25th International Conference on Application-Specific Systems, 2014

Stand-Alone Memory Controller for Graphics System.
Proceedings of the Reconfigurable Computing: Architectures, Tools, and Applications, 2014

2013
Deadline-Based MapReduce Workload Management.
IEEE Trans. Netw. Serv. Manag., 2013

A Systematic Methodology to Generate Decomposable and Responsive Power Models for CMPs.
IEEE Trans. Computers, 2013

A template system for the efficient compilation of domain abstractions onto reconfigurable computers.
J. Syst. Archit., 2013

Programmability and portability for exascale: Top down programming methodology and tools with StarSs.
J. Comput. Sci., 2013

Counter-Based Power Modeling Methods: Top-Down vs. Bottom-Up.
Comput. J., 2013

Enabling Distributed Key-Value Stores with Low Latency-Impact Snapshot Support.
Proceedings of the 2013 IEEE 12th International Symposium on Network Computing and Applications, 2013

Self-Adaptive OmpSs Tasks in Heterogeneous Environments.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Implementing OmpSs support for regions of data in architectures with multiple address spaces.
Proceedings of the International Conference on Supercomputing, 2013

Aeneas: A Tool to Enable Applications to Effectively Use Non-Relational Databases.
Proceedings of the International Conference on Computational Science, 2013

Loop level speculation in a task based programming model.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Autonomic Placement of Mixed Batch and Transactional Workloads.
IEEE Trans. Parallel Distributed Syst., 2012

DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012

Energy accounting for shared virtualized environments under DVFS using PMC-based power models.
Future Gener. Comput. Syst., 2012

POTRA: a framework for building power models for next generation multicore architectures.
Proceedings of the ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, 2012

Integrating Dataflow Abstractions into the Shared Memory Model.
Proceedings of the IEEE 24th International Symposium on Computer Architecture and High Performance Computing, 2012

OmpSs-OpenCL Programming Model for Heterogeneous Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Productive Programming of GPU Clusters with OmpSs.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Assessing the Impact of Network Compression on Molecular Dynamics and Finite Element Methods.
Proceedings of the 14th IEEE International Conference on High Performance Computing and Communication & 9th IEEE International Conference on Embedded Software and Systems, 2012

Task-based parallel breadth-first search in heterogeneous environments.
Proceedings of the 19th International Conference on High Performance Computing, 2012

Optimizing resource utilization with software-based temporal multi-threading (stmt).
Proceedings of the 19th International Conference on High Performance Computing, 2012

PPMC: Hardware scheduling and memory management support for multi accelerators.
Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL), 2012

On the Instrumentation of OpenMP and OmpSs Tasking Constructs.
Proceedings of the Euro-Par 2012: Parallel Processing Workshops, 2012

Transactional Access to Shared Memory in StarSs, a Task Based Programming Model.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Topic 11: Multicore and Manycore Programming.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

DMA-circular: an enhanced high level programmable DMA controller for optimized management of on-chip local memories.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

BSArc: blacksmith streaming architecture for HPC accelerators.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

PPMC: A Programmable Pattern Based Memory Controller.
Proceedings of the Reconfigurable Computing: Architectures, Tools and Applications, 2012

Supporting stateful tasks in a dataflow graph.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Assessing Accelerator-Based HPC Reverse Time Migration.
IEEE Trans. Parallel Distributed Syst., 2011

Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures.
Parallel Process. Lett., 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
Int. J. Parallel Program., 2011

Local Memory Design Space Exploration for High-Performance Computing.
Comput. J., 2011

TARCAD: A template architecture for reconfigurable accelerator designs.
Proceedings of the IEEE 9th Symposium on Application Specific Processors, 2011

Hybrid Parallel Programming with MPI/StarSs.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Non-intrusive Estimation of QoS Degradation Impact on E-Commerce User Satisfaction.
Proceedings of The Tenth IEEE International Symposium on Networking Computing and Applications, 2011

Resource-Aware Adaptive Scheduling for MapReduce Clusters.
Proceedings of the Middleware 2011, 2011

Poster: programming clusters of GPUs with OMPSs.
Proceedings of the 25th International Conference on Supercomputing, 2011, Tucson, AZ, USA, May 31, 2011

Design space exploration for aggressive core replication schemes in CMPs.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Implementation of a Reverse Time Migration kernel using the HCE High Level Synthesis tool.
Proceedings of the 2011 International Conference on Field-Programmable Technology, 2011

Introduction.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Productive Cluster Programming with OmpSs.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

2010
Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture.
IEEE Trans. Parallel Distributed Syst., 2010

Guest Editors' Introduction.
Int. J. Parallel Program., 2010

Extending OpenMP to Survive the Heterogeneous Multi-Core Era.
Int. J. Parallel Program., 2010

Holistic Management for a more Energy-Efficient Cloud Computing.
ERCIM News, 2010

A survey on performance management for internet applications.
Concurr. Comput. Pract. Exp., 2010

Effective communication and computation overlap with hybrid MPI/SMPSs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Transient Congestion Avoidance in Software Distributed Shared Memory Systems.
Proceedings of the 2010 International Conference on Parallel and Distributed Computing, 2010

Task Superscalar: An Out-of-Order Task Pipeline.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

A Proposal for User-Defined Reductions in OpenMP.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

An Extension to Improve OpenMP Tasking Control.
Proceedings of the Beyond Loop Level Parallelism in OpenMP: Accelerators, 2010

Characterization of workload and resource consumption for an online travel and booking site.
Proceedings of the 2010 IEEE International Symposium on Workload Characterization, 2010

Overlapping communication and computation by using a hybrid MPI/SMPSs approach.
Proceedings of the 24th International Conference on Supercomputing, 2010

Decomposable and responsive power models for multicore processors using performance counters.
Proceedings of the 24th International Conference on Supercomputing, 2010

Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters.
Proceedings of the 39th International Conference on Parallel Processing, 2010

A CellBE-based HPC Application for the Analysis of Vulnerabilities in Cryptographic Hash Functions.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Analysis of Task Offloading for Accelerators.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Accurate energy accounting for shared virtualized environments using PMC-based power modeling techniques.
Proceedings of the 2010 11th IEEE/ACM International Conference on Grid Computing, 2010

FEM: A Step Towards a Common Memory Layout for FPGA Based Accelerators.
Proceedings of the International Conference on Field Programmable Logic and Applications, 2010

Starsscheck: A Tool to Find Errors in Task-Based Parallel Programs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Reducing data access latency in SDSM systems using runtime optimizations.
Proceedings of the 2010 conference of the Centre for Advanced Studies on Collaborative Research, 2010

2009
The Design of OpenMP Tasks.
IEEE Trans. Parallel Distributed Syst., 2009

Guest Editors' Introduction.
Int. J. Parallel Program., 2009

A Proposal to Extend the OpenMP Tasking Model with Dependent Tasks.
Int. J. Parallel Program., 2009

Hierarchical Task-Based Programming With StarSs.
Int. J. High Perform. Comput. Appl., 2009

BSC Vision Towards Exascale.
Int. J. High Perform. Comput. Appl., 2009

Creating Power-Aware Middleware for Energy-Efficient Data Centres.
ERCIM News, 2009

OpenMP extensions for FPGA accelerators.
Proceedings of the 2009 International Conference on Embedded Computer Systems: Architectures, 2009

Atomic quake: using transactional memory in an interactive multiplayer game server.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Turbocharging boosted transactions or: how i learnt to stop worrying and love longer transactions.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Batch Job Profiling and Adaptive Profile Enforcement for Virtualized Environments.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Impact of the Memory Hierarchy on Shared Memory Architectures in Multicore Programming Models.
Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Achieving high memory performance from heterogeneous architectures with the SARC programming model.
Proceedings of the 10th workshop on MEmory performance, 2009

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Unrolling Loops Containing Task Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures.
Proceedings of the Evolving OpenMP in an Age of Extreme Parallelism, 2009

QuakeTM: parallelizing a complex sequential application using transactional memory.
Proceedings of the 23rd international conference on Supercomputing, 2009

Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP.
Proceedings of the ICPP 2009, 2009

Speeding Up Distributed MapReduce Applications Using Hardware Accelerators.
Proceedings of the ICPP 2009, 2009

CellMT: A cooperative multithreading library for the Cell/B.E.
Proceedings of the 16th International Conference on High Performance Computing, 2009

Exploiting memory customization in FPGA for 3D stencil computations.
Proceedings of the 2009 International Conference on Field-Programmable Technology, 2009

Introduction.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

An Extension of the StarSs Programming Model for Platforms with Multiple GPUs.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Mapping stream programs onto heterogeneous multiprocessor systems.
Proceedings of the 2009 International Conference on Compilers, 2009

OpenMP tasking analysis for programmers.
Proceedings of the 2009 conference of the Centre for Advanced Studies on Collaborative Research, 2009

2008
Nebelung: Execution Environment for Transactional OpenMP.
Int. J. Parallel Program., 2008

Guest Editors Introduction: Special Issue on OpenMP.
Int. J. Parallel Program., 2008

A hybrid connector for efficient web servers.
Int. J. High Perform. Comput. Netw., 2008

Power-efficient VLIW design using clustering and widening.
Int. J. Embed. Syst., 2008

Dynamic CPU provisioning for self-managed secure web applications in SMP hosting platforms.
Comput. Networks, 2008

An adaptive cut-off for task parallelism.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Utility-based placement of dynamic Web applications with fairness goals.
Proceedings of the IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubioquitous Networks and Services, 2008

Enabling Resource Sharing between Transactional and Batch Workloads Using Dynamic Application Placement.
Proceedings of the Middleware 2008, 2008

WormBench: a configurable workload for evaluating transactional memory systems.
Proceedings of the 9th workshop on MEmory performance, 2008

Evaluation of memory performance on the cell BE with the SARC programming model.
Proceedings of the 9th workshop on MEmory performance, 2008

Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Extending the OpenMP Tasking Model to Allow Dependent Tasks.
Proceedings of the OpenMP in a New Era of Parallelism, 4th International Workshop, 2008

Evaluation of OpenMP Task Scheduling Strategies.
Proceedings of the OpenMP in a New Era of Parallelism, 4th International Workshop, 2008

Understanding tuning complexity in multithreaded and hybrid web servers.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Improving Web Server Performance Through Main Memory Compression.
Proceedings of the 14th International Conference on Parallel and Distributed Systems, 2008

Tailoring Resources: The Energy Efficient Consolidation Strategy Goes Beyond Virtualization.
Proceedings of the 2008 International Conference on Autonomic Computing, 2008

Managing SLAs of heterogeneous workloads using dynamic application placement.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008

OpenMP tasks in IBM XL compilers.
Proceedings of the 2008 conference of the Centre for Advanced Studies on Collaborative Research, 2008

Hybrid access-specific software cache techniques for the cell BE architecture.
Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007
Transactional Memory: An Overview.
IEEE Micro, 2007

A Proposal for Error Handling in OpenMP.
Int. J. Parallel Program., 2007

Introduction.
Int. J. Parallel Program., 2007

Special Issue on OpenMP - Guest Editors' Introduction.
Int. J. Parallel Program., 2007

Designing an overload control strategy for secure e-commerce applications.
Comput. Networks, 2007

A Streaming Machine Description and Programming Model.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Multithreaded software transactional memory and OpenMP.
Proceedings of the 2007 workshop on MEmory performance, 2007

Improving disk bandwidth-bound applications through main memory compression.
Proceedings of the 2007 workshop on MEmory performance, 2007

A Novel Asynchronous Software Cache Implementation for the Cell-BE Processor.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

An Experimental Evaluation of the New OpenMP Tasking Model.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Transactional Memory and OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

A Proposal for Task Parallelism in OpenMP.
Proceedings of the A Practical Programming Model for the Multi-Core Era, 2007

Support for OpenMP tasks in Nanos v4.
Proceedings of the 2007 conference of the Centre for Advanced Studies on Collaborative Research, 2007

2006
Running OpenMP applications efficiently on an everything-shared SDSM.
J. Parallel Distributed Comput., 2006

Employing nested OpenMP for the parallelization of multi-zone computational fluid dynamics applications.
J. Parallel Distributed Comput., 2006

Exploiting multilevel parallelism using OpenMP on a massive multithreaded architecture.
J. Embed. Comput., 2006

Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors.
IEEE Comput. Archit. Lett., 2006

Runtime Address Space Computation for SDSM Systems.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Techniques supporting threadprivate in OpenMP.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Topic 7: Parallel Computer Architecture and Instruction Level Parallelism.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005
Tuning Dynamic Web Applications using Fine-Grain Analysis.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

WAS Control Center: An Autonomic Performance-Triggered Tracing Environment for WebSphere.
Proceedings of the 13th Euromicro Workshop on Parallel, 2005

Experiences Parallelizing a Web Server with OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2005

Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Characterizing Secure Dynamic Web Applications Scalability.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Session-Based Adaptive Overload Control for Secure Dynamic Web Applications.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

A Hybrid Web Server Architecture for e-Commerce Applications.
Proceedings of the 11th International Conference on Parallel and Distributed Systems, 2005

A Hybrid Web Server Architecture for Secure e-Business Web Applications.
Proceedings of the High Performance Computing and Communications, 2005

2004
Register Constrained Modulo Scheduling.
IEEE Trans. Parallel Distributed Syst., 2004

Software and Hardware Techniques to Optimize Register File Utilization in VLIW Architectures.
Int. J. Parallel Program., 2004

Dynamic Memory Instruction Bypassing.
Int. J. Parallel Program., 2004

High-performance and low-power VLIW cores for numerical computations.
Int. J. High Perform. Comput. Netw., 2004

Performance and Power Evaluation of Clustered VLIW Processors with Wide Functional Units.
Proceedings of the Computer Systems: Architectures, 2004

Evaluating the Scalability of Java Event-Driven Web Servers.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

2003
Scaling non-regular shared-memory codes by reusing custom loop schedules.
Sci. Program., 2003

Automatic multilevel parallelization using OpenMP.
Sci. Program., 2003

Introduction.
Sci. Program., 2003

Is the Schedule Clause Really Necessary in OpenMP?
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Evaluation of OpenMP for the Cyclops Multithreaded Architecture.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2003

Complete instrumentation requirements for performance analysis of Web based technologies.
Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003

Power-Performance Trade-Offs in Wide and Clustered VLIW Cores for Numerical Codes.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Hierarchical Clustered Register File Organization for VLIW Processors.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Application/Kernel Cooperation Towards the Efficient Execution of Shared-Memory Parallel Java Codes.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

2002
Scheduler-Activated Dynamic Page Migration for Multiprogrammed DSM Multiprocessors.
J. Parallel Distributed Comput., 2002

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models.
Int. J. Parallel Program., 2002

Dual-Level Parallelism Exploitation with OpenMP in Coastal Ocean Circulation Modeling.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Cost-Effective Compiler Directed Memory Prefetching and Bypassing.
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001
Static and Dynamic Locality Optimizations Using Integer Linear Programming.
IEEE Trans. Parallel Distributed Syst., 2001

A Framework for Integrating Data Alignment, Distribution, and Redistribution in Distributed Memory Multiprocessors.
IEEE Trans. Parallel Distributed Syst., 2001

Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures.
IEEE Trans. Computers, 2001

Lifetime-Sensitive Modulo Scheduling in a Production Environment.
IEEE Trans. Computers, 2001

New OpenMP directives for irregular data access loops.
Sci. Program., 2001

Exploiting memory affinity in OpenMP through schedule reuse.
SIGARCH Comput. Archit. News, 2001

Strategies for the efficient exploitation of loop-level parallelism in Java.
Concurr. Comput. Pract. Exp., 2001

A Study of Implicit Data Distribution Methods for OpenMP Using the SPEC Benchmarks.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Defining and Supporting Pipelined Executions in OpenMP.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Scaling irregular parallel codes with minimal programming effort.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Modulo scheduling with integrated register spilling for clustered VLIW architectures.
Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

<i>MIRS</i>: Modulo Scheduling with Integrated Register Spilling.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

A novel renaming mechanism that boosts software prefetching.
Proceedings of the 15th international conference on Supercomputing, 2001

The trade-off between implicit and explicit data distribution in shared-memory programming paradigms.
Proceedings of the 15th international conference on Supercomputing, 2001

Performance Analysis Tools for Parallel Java Applications on Shared-memory Systems.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Complex Pipelined Executions in OpenMP Parallel Applications.
Proceedings of the 2001 International Conference on Parallel Processing, 2001

Topic 08+13: Instruction-Level Parallelism and Computer Architecture.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
NanosCompiler: supporting flexible multilevel parallelism exploitation in OpenMP.
Concurr. Pract. Exp., 2000

Is Data Distribution Necessary in OpenMP?
Proceedings of the Proceedings Supercomputing 2000, 2000

Improved spill code generation for software pipelined loops.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

Two-level hierarchical register file organization for VLIW processors.
Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

UPMLIB: A Runtime System for Tuning the Memory Performance of OpenMP Programs on Scalable Shared-Memory Multiprocessors.
Proceedings of the Languages, 2000

OpenMP Extensions for Thread Groups and Their Run-Time Support.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Towards an efficient exploitation of loop-level parallelism in Java.
Proceedings of the ACM 2000 Java Grande Conference, San Francisco, CA, USA, 2000

Leveraging Transparent Data Distribution in OpenMP via User-Level Dynamic Page Migration.
Proceedings of the High Performance Computing, Third International Symposium, 2000

Applying Interposition Techniques for Performance Analysis of OpenMP Parallel Applications.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A case for use-level dynamic page migration.
Proceedings of the 14th international conference on Supercomputing, 2000

User-Level Dynamic Page Migration for Multiprogrammed Shared-Memory Multiprocessors.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

1999
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors.
Proceedings of the 13th international conference on Supercomputing, 1999

Increasing effective IPC by exploiting distant parallelism.
Proceedings of the 13th international conference on Supercomputing, 1999

An integer linear programming approach for optimizing cache locality.
Proceedings of the 13th international conference on Supercomputing, 1999

Impact on Performance of Fused Multiply-Add Units in Aggressive VLIW Architectures.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Exploiting Multiple Levels of Parallelism in OpenMP: A Case Study.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Quantifying the Benefits of SPECint Distant Parallelism in Simultaneous Multi-Threading Architectures.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998
Modulo Scheduling with Reduced Register Pressure.
IEEE Trans. Computers, 1998

Tools and Techniques for Automatic Data Layout: A Case Study.
Parallel Comput., 1998

Quantitative Evaluation of Register Pressure on Software Pipelined Loops.
Int. J. Parallel Program., 1998

Widening Resources: A Cost-effective Technique for Aggressive ILP Architectures.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Resource Widening Versus Replication: Limits and Performance-cost Trade-off.
Proceedings of the 12th international conference on Supercomputing, 1998

1997
High Performance Fortran Implementations: A Survey.
Sci. Program., 1997

DDT: A Research Tool for Automatic Data Distribution in High Performance Fortran.
Sci. Program., 1997

Exploiting Parallelism Through Directives on the Nano-Threads Programming Model.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

Analysis of Several Scheduling Algorithms under the Nano-Thread Programming Model.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

Increasing Memory Bandwidth with Wide Buses: Compiler, Hardware and Performance Trade-Offs.
Proceedings of the 11th international conference on Supercomputing, 1997

1996
Using a 0-1 Integer Programming Model for Automatic Static Data Distribution.
Parallel Process. Lett., 1996

A framework for automatic dynamic data mapping.
Proceedings of the Eighth IEEE Symposium on Parallel and Distributed Processing, 1996

Loop Parallelization: Revisiting Framework of Unimodular Transformations.
Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96), 1996

Heuristics for Register-Constrained Software Pipelining.
Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture, 1996

Data Distribution and Loop Parallelization for Shared-Memory Multiprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

A Library Implementation of the Nano-Threads Programming Model.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Swing module scheduling: a lifetime-sensitive approach.
Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995
Conflict-Free Access for Streams in Multimodule Memories.
IEEE Trans. Computers, 1995

Analyzing reference patterns in automatic data distribution tools.
Int. J. Parallel Program., 1995

A Novel Approach Towards Automatic Data Distribution.
Proceedings of the Proceedings Supercomputing '95, San Diego, CA, USA, December 4-8, 1995, 1995

Quantitative analysis of vector code.
Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

Hypernode reduction modulo scheduling.
Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

Data Redistribution in an Automatic Data Distribution Tool.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Vector Multiprocessors with Arbitrated Memory Access.
Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

Non-Consistent Dual Register Files to Reduce Register Pressure.
Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Automatic generation of loop scheduling for VLIW.
Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994
Network Synchronization and Out-of-Order Access to Vectors.
Parallel Process. Lett., 1994

Access To Vectors In Multi-module Memories.
Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994

Detecting and Using Affinity in an Automatic Data Distribution Tool.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Synchronized access to streams in SIMD vector multiprocessors.
Proceedings of the 8th international conference on Supercomputing, 1994

Memory Access Synchronization in Vector Multiprocessors.
Proceedings of the Parallel Processing: CONPAR 94, 1994

Using Sacks to Organize Registers in VLIW Machines.
Proceedings of the Parallel Processing: CONPAR 94, 1994

1993
Conflict-free access to streams in multiprocessor systems.
Microprocess. Microprogramming, 1993

Access to streams in multiprocessor systems.
Proceedings of the 1993 Euromicro Workshop on Parallel and Distributed Processing, 1993

Align and Distribute-based Linear Loop Transformations.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

Partitioning the Statement per Iteration Space Using Non-Singular Matrices.
Proceedings of the 7th international conference on Supercomputing, 1993

1992
Increasing the Number of Strides for Conflict-Free Vector Access.
Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

Conflict-free access of vectors with power-of-two strides.
Proceedings of the 6th international conference on Supercomputing, 1992

1991
Conflict-Free Strides for Vectors in Matched Memories.
Parallel Process. Lett., 1991

Scheduling in a continuous area-time design space.
Microprocessing and Microprogramming, 1991

Balanced Loop Partitioning Using GTS.
Proceedings of the Languages and Compilers for Parallel Computing, 1991

On Automatic Loop Data-Mapping for Distributed-Memory Multiprocessors.
Proceedings of the Distributed Memory Computing, 2nd European Conference, 1991

1989
Paralelización automática de recurrencias en programas secuenciales numéricos.
PhD thesis, 1989

GTS: parallelization and vectorization of tight recurrences.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

GTS: Extracting Full Parallelism Out of DO Loops.
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989


  Loading...