Alex Ramírez

According to our database1, Alex Ramírez authored at least 123 papers between 1999 and 2021.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2021

2019
The Abstract Streaming Machine: Compile-Time Performance Modelling of Stream Programs on Heterogeneous Multiprocessors.
Trans. High Perform. Embed. Archit. Compil., 2019

2018
vbench: Benchmarking Video Transcoding in the Cloud.
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems, 2018

2017
Adaptive Runtime-Assisted Block Prefetching on Chip-Multiprocessors.
Int. J. Parallel Program., 2017

Beyond the socket: NUMA-aware GPUs.
Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017

Sharing the instruction cache among lean cores on an asymmetric CMP for HPC applications.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

2016

Rebalancing the core front-end through HPC code analysis.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

2015
Designing Efficient Heterogeneous Memory Architectures.
IEEE Micro, 2015

Limpio: LIghtweight MPI instrumentatiOn.
Proceedings of the 2015 IEEE 23rd International Conference on Program Comprehension, 2015

Exploring multiple sleep modes in on/off based energy efficient HPC networks.
Proceedings of the 33rd IEEE International Conference on Computer Design, 2015

2014
Tibidabo: Making the case for an ARM-based HPC system.
Future Gener. Comput. Syst., 2014

Enabling preemptive multiprogramming on GPUs.
Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

A performance perspective on energy efficient HPC links.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Author retrospective for software trace cache.
Proceedings of the ACM International Conference on Supercomputing 25th Anniversary Volume, 2014

Evaluating Execution Time Predictability of Task-Based Programs on Multi-Core Processors.
Proceedings of the Euro-Par 2014: Parallel Processing Workshops, 2014

2013
The low power architecture approach towards exascale computing.
J. Comput. Sci., 2013

Energy efficiency vs. performance of the numerical solution of PDEs: An application study on a low-power ARM-based cluster.
J. Comput. Phys., 2013

Supercomputing with commodity CPUs: are mobile SoCs ready for HPC?
Proceedings of the International Conference for High Performance Computing, 2013

Parallelizing general histogram application for CUDA architectures.
Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Power/performance evaluation of energy efficient Ethernet (EEE) for High Performance Computing.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Trace filtering of multithreaded applications for CMP memory simulation.
Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Programmable and Scalable Reductions on Clusters.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Data placement in HPC architectures with heterogeneous off-chip memory.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Experiences with mobile processors for energy efficient HPC.
Proceedings of the Design, Automation and Test in Europe, 2013

2012
Scalable Parallel Programming Applied to H.264/AVC Decoding.
Springer Briefs in Computer Science, Springer, ISBN: 978-1-4614-2230-3, 2012

DMA++: On the Fly Data Realignment for On-Chip Memories.
IEEE Trans. Computers, 2012

On the simulation of large-scale architectures using multiple application abstraction levels.
ACM Trans. Archit. Code Optim., 2012

ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites.
Bioinform., 2012

Kernel Partitioning of Streaming Applications: A Statistical Approach to an NP-complete Problem.
Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012

Topic 16: GPU and Accelerators Computing.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

2011
Dynamic Cache Partitioning Based on the MLP of Cache Misses.
Trans. High Perform. Embed. Archit. Compil., 2011

A Highly Scalable Parallel Implementation of H.264.
Trans. High Perform. Embed. Archit. Compil., 2011

Simulating Whole Supercomputer Applications.
IEEE Micro, 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
Int. J. Parallel Program., 2011

Scalable multicore architectures for long DNA sequence comparison.
Concurr. Comput. Pract. Exp., 2011

Breaking the bandwidth wall in chip multiprocessors.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Supercomputing: Past, present, and a possible future.
Proceedings of the 2011 International Conference on Embedded Computer Systems: Architectures, 2011

Trace-driven simulation of multithreaded applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2011

On the memory system requirements of future scientific applications: Four case-studies.
Proceedings of the 2011 IEEE International Symposium on Workload Characterization, 2011

FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Scaling HMMER Performance on Multicore Architectures.
Proceedings of the International Conference on Complex, 2011

Parametrizing multicore architectures for multiple sequence alignment.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Scalability Evaluation of a Polymorphic Register File: A CG Case Study.
Proceedings of the Architecture of Computing Systems - ARCS 2011, 2011

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory.
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

2010
Advancing Computational Science, Visualization and Homeland Security Research/ Education at Minority Serving Institutions National Model Promoted/ Implemented by MSI-CIEC (Minority Serving Institutions-CyberInfrastructure Empowerment Coalition).
Proceedings of the International Conference on Computational Science, 2010

The SARC Architecture.
IEEE Micro, 2010

ArchExplorer for Automatic Design Space Exploration.
IEEE Micro, 2010

A Polymorphic Register File for matrix operations.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Interleaving granularity on high bandwidth memory architecture for CMPs.
Proceedings of the 2010 International Conference on Embedded Computer Systems: Architectures, 2010

Task Superscalar: An Out-of-Order Task Pipeline.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

Can Manycores Support the Memory Requirements of Scientific Applications?
Proceedings of the Computer Architecture, 2010

Comparing last-level cache designs for CMP architectures.
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies, 2010

Buffer Sizing for Self-timed Stream Programs on Heterogeneous Distributed Memory Multiprocessors.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Long DNA Sequence Comparison on Multicore Architectures.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Starsscheck: A Tool to Find Errors in Task-Based Parallel Programs.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Empowering Business Students - Using Web 2.0 Tools in the Classroom.
Proceedings of the CSEDU 2010 - Proceedings of the Second International Conference on Computer Supported Education, Valencia, Spain, April 7-10, 2010, 2010

Scalability Analysis of Progressive Alignment on a Multicore.
Proceedings of the CISIS 2010, 2010

2009
Parallel Scalability of Video Decoders.
J. Signal Process. Syst., 2009

DIA: A Complexity-Effective Decoding Architecture.
IEEE Trans. Computers, 2009

Available task-level parallelism on the Cell BE.
Sci. Program., 2009

CellSs: Scheduling techniques to better exploit memory hierarchy.
Sci. Program., 2009

FlexDCP: a QoS framework for CMP architectures.
ACM SIGOPS Oper. Syst. Rev., 2009

Evaluación del rendimiento paralelo en el nivel macro bloque del decodificador H.264 en una arquitectura multiprocesador cc-NUMA.
Rev. Avances en Sistemas Informática, 2009

Thread to Core Assignment in SMT On-Chip Multiprocessors.
Proceedings of the 21st International Symposium on Computer Architecture and High Performance Computing, 2009

Scalability of Macroblock-level Parallelism for H.264 Decoding.
Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems, 2009

Parallel H.264 Decoding on an Embedded Multicore Processor.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Quantitative analysis of sequence alignment applications on multiprocessor architectures.
Proceedings of the 6th Conference on Computing Frontiers, 2009

Mapping stream programs onto heterogeneous multiprocessor systems.
Proceedings of the 2009 International Conference on Compilers, 2009

2008
Multicore Resource Management.
IEEE Micro, 2008

Preliminary Analysis of the Cell BE Processor Limitations for Sequence Alignment Applications.
Proceedings of the Embedded Computer Systems: Architectures, 2008

Analysis of video filtering on the cell processor.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2008), 2008

MFLUSH: Handling Long-Latency Loads in SMT On-Chip Multiprocessors.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

MLP-Aware Dynamic Cache Partitioning.
Proceedings of the High Performance Embedded Architectures and Compilers, 2008

2007
High-Performance Embedded Architecture and Compilation Roadmap.
Trans. High Perform. Embed. Archit. Compil., 2007

Enlarging Instruction Streams.
IEEE Trans. Computers, 2007

Explaining Dynamic Cache Partitioning Speed Ups.
IEEE Comput. Archit. Lett., 2007

Online Prediction of Applications Cache Utility.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

On the Problem of Minimizing Workload Execution Time in SMT Processors.
Proceedings of the 2007 International Conference on Embedded Computer Systems: Architectures, 2007

A Streaming Machine Description and Programming Model.
Proceedings of the Embedded Computer Systems: Architectures, 2007

Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications.
Proceedings of the 2007 IEEE International Symposium on Performance Analysis of Systems and Software, 2007

HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications.
Proceedings of the IEEE 10th International Symposium on Workload Characterization, 2007

2006
Predictable Performance in SMT Processors: Synergy between the OS and SMTs.
IEEE Trans. Computers, 2006

Performance Analysis of Sequence Alignment Applications.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

Branch predictor guided instruction decoding.
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006

2005
Software Trace Cache.
IEEE Trans. Computers, 2005

Better Branch Prediction Through Prophet/Critic Hybrids.
IEEE Micro, 2005

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005

Multiple Stream Prediction.
Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Effective Instruction Prefetching via Fetch Prestaging.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

A Complexity-Effective Simultaneous Multithreading Architecture.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Architectural support for real-time task scheduling in SMT processors.
Proceedings of the 2005 International Conference on Compilers, 2005

2004
A low-complexity fetch architecture for high-performance superscalar processors.
ACM Trans. Archit. Code Optim., 2004

QoS for High-Performance SMT Processors in Embedded Systems.
IEEE Micro, 2004

A latency-conscious SMT branch prediction architecture.
Int. J. High Perform. Comput. Netw., 2004

Optimising long-latency-load-aware fetch policies for SMT processors.
Int. J. High Perform. Comput. Netw., 2004

Dynamically Controlled Resource Allocation in SMT Processors.
Proceedings of the 37th Annual International Symposium on Microarchitecture (MICRO-37 2004), 2004

Prophet/Critic Hybrid Branch Prediction.
Proceedings of the 31st International Symposium on Computer Architecture (ISCA 2004), 2004

DCache Warn: An I-Fetch Policy to Increase SMT Efficiency.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors.
Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Enabling SMT for real-time embedded systems.
Proceedings of the 2004 12th European Signal Processing Conference, 2004

Feasibility of QoS for SMT.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

Implicit vs. Explicit Resource Allocation in SMT Processors.
Proceedings of the 2004 Euromicro Symposium on Digital Systems Design (DSD 2004), Architectures, Methods and Tools, 31 August, 2004

Predictable performance in SMT processors.
Proceedings of the First Conference on Computing Frontiers, 2004

Reducing Fetch Architecture Complexity Using Procedure Inlining.
Proceedings of the 8th Annual Workshop on Interaction between Compilers and Computer Architecture (INTERACT-8 2004), 2004

2003
Tolerating Branch Predictor Latency on SMT.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

Improving Memory Latency Aware Fetch Policies for SMT Processors.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

2002
High performance instruction fetch using software and hardware co-design.
PhD thesis, 2002

Software Trace Cache for Commercial Applications.
Int. J. Parallel Program., 2002

Fetching instruction streams.
Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

A Comprehensive Analysis of Indirect Branch Prediction.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

Studying New Ways for Improving Adaptive History Length Branch Predictors.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

A Comparative Study of Redundancy in Trace Caches (Research Note).
Proceedings of the Euro-Par 2002, 2002

2001
Instruction fetch architectures and code layout optimizations.
Proc. IEEE, 2001

Code layout optimizations for transaction processing workloads.
Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Branch Prediction Using Profile Data.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Trace Cache Redundancy: Red & Blue Traces.
Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

On the Performance of Fetch Engines Running DSS Workloads.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

The Effect of Code Reordering on Branch Prediction.
Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00), 2000

1999
Software trace cache.
Proceedings of the 13th international conference on Supercomputing, 1999

Optimization of Instruction Fetch for Decision Support Workloads.
Proceedings of the International Conference on Parallel Processing 1999, 1999


  Loading...