Juan Fernández Peinador

Proceedings of the Handbook on Data Centers, 2015

2014

Selective dynamic serialization for reducing energy consumption in hardware transactional memory systems.

[BibT_eX]

[DOI]

J. Supercomput., 2014

2013

Design of an efficient communication infrastructure for highly contended locks in many-core CMPs.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2013

On the design of energy-efficient hardware transactional memory systems.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2013

ECONO: Express coherence notifications for efficient cache coherency in many-core CMPs.

[BibT_eX]

[DOI]

Alberto Ros

Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Efficient Dir0B Cache Coherency for Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2013

Deploying Hardware Locks to Improve Performance and Energy Efficiency of Hardware Transactional Memory.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems - ARCS 2013, 2013

2012

Efficient Hardware Barrier Synchronization in Many-Core CMPs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2012

Stencil computations on heterogeneous platforms for the Jacobi method: GPUs versus Cell BE.

[BibT_eX]

[DOI]

J. Supercomput., 2012

The 2D wavelet transform on emerging architectures: GPUs and multicores.

[BibT_eX]

[DOI]

J. Real Time Image Process., 2012

Dynamic Serialization: Improving Energy Consumption in Eager-Eager Hardware Transactional Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 20th Euromicro International Conference on Parallel, 2012

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs.

[BibT_eX]

[DOI]

Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition, 2012

2011

GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

2010

Characterizing the basic synchronization and communication operations in Dual Cell-based Blades through CellStats.

[BibT_eX]

[DOI]

J. Supercomput., 2010

Parallel 3D fast wavelet transform on manycore GPUs and multicore CPUs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Computational Science, 2010

Characterizing Energy Consumption in Hardware Transactional Memory Systems.

[BibT_eX]

[DOI]

Proceedings of the 22st International Symposium on Computer Architecture and High Performance Computing, 2010

A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

Efficient and scalable barrier synchronization for many-core CMPs.

[BibT_eX]

[DOI]

Proceedings of the 7th Conference on Computing Frontiers, 2010

2009

A Parallel Implementation of the 2D Wavelet Transform Using CUDA.

[BibT_eX]

[DOI]

Proceedings of the 17th Euromicro International Conference on Parallel, 2009

Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008

CellStats: A Tool to Evaluate the Basic Synchronization and Communication Operations of the Cell BE.

[BibT_eX]

[DOI]

Proceedings of the 16th Euromicro International Conference on Parallel, 2008

Characterizing the Basic Synchronization and Communication Operations in Dual Cell-Based Blades.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2008

Multicore Platforms for Scientific Computing: Cell BE and NVIDIA Tesla.

[BibT_eX]

Proceedings of the 2008 International Conference on Scientific Computing, 2008

2007

Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors.

[BibT_eX]

[DOI]

Oreste Villa

Daniele Paolo Scarpazza

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Multicore Surprises: Lessons Learned from Optimizing Sweep3D on the Cell Broadband Engine.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

2006

STORM: Scalable Resource Management for Large-Scale Parallel Computers.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2006

NIC-based reduction algorithms for large-scale clusters.

[BibT_eX]

[DOI]

Adam Moody

Dhabaleswar K. Panda

Int. J. High Perform. Comput. Netw., 2006

An Abstract Interface for System Software on Large-Scale Clusters.

[BibT_eX]

[DOI]

Comput. J., 2006

2005

Adaptive Parallel Job Scheduling with Flexible Coscheduling.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2005

Assessing MPI Performance on QsNet<sup>II</sup>.

[BibT_eX]

[DOI]

Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

2004

On the Feasibility of Incremental Checkpointing for Scientific Computing.

[BibT_eX]

[DOI]

Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Architectural Support for System Software on Large-Scale Clusters.

[BibT_eX]

[DOI]

Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004

Designing Parallel Operating Systems via Parallel Programming.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003

Scalable NIC-based Reduction on Large-scale Clusters.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE SC2003 Conference on High Performance Networking and Computing, 2003

Parallel Job Scheduling under Dynamic Workloads.

[BibT_eX]

[DOI]

Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003

Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources.

[BibT_eX]

[DOI]

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Scalable collective communication on the ASCI Q machine.

[BibT_eX]

[DOI]

Salvador Coll

Proceedings of the 11th Annual IEEE Symposium on High Performance Interconnects, 2003

2002

STORM: lightning-fast resource management.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Improving the Performance of Real-Time Communication Services on High-Speed LANs under Topology Changes.

[BibT_eX]

[DOI]

José M. García

José Duato

Proceedings of the 27th Annual IEEE Conference on Local Computer Networks (LCN 2002), 2002

Scalable Resource Management in High Performance Computers.

[BibT_eX]

[DOI]

Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002

2001

Performance Evaluation of Real-Time Communication Services on High-Speed LANs under Topology Changes.

[BibT_eX]

[DOI]