Antonio González

Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

Performance analysis and predictability of the software layer in dynamic binary translators/optimizers.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, 2013

2012

The migration prefetcher: Anticipating data promotion in dynamic NUCA caches.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2012

Impact of positive bias temperature instability (PBTI) on 3T1D-DRAM cells.

[BibT_eX]

[DOI]

Integr., 2012

A HW/SW Co-designed Programmable Functional Unit.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2012

DDGacc: boosting dynamic DDG-based binary optimizations through specialized hardware support.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Virtual Execution Environments, 2012

Improving the Resilience of an IDS against Performance Throttling Attacks.

[BibT_eX]

[DOI]

Proceedings of the Security and Privacy in Communication Networks, 2012

Improving the Performance Efficiency of an IDS by Exploiting Temporal Locality in Network Traffic.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Exploiting temporal locality in network traffic using commodity multi-cores.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2012

Setting an error detection infrastructure with low cost acoustic wave detectors.

[BibT_eX]

[DOI]

Gaurang Upasani

Proceedings of the 39th International Symposium on Computer Architecture (ISCA 2012), 2012

Hardware/Software Mechanisms for Protecting an IDS against Algorithmic Complexity Attacks.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum, 2012

A novel variation-tolerant 4T-DRAM cell with enhanced soft-error tolerance.

[BibT_eX]

[DOI]

Proceedings of the 30th International IEEE Conference on Computer Design, 2012

Speculative dynamic vectorization for HW/SW co-designed processors.

[BibT_eX]

[DOI]

Rakesh Kumar

Alejandro Martínez

Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011

CROB: Implementing a Large Instruction Window through Compression.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2011

Compiler Directed Issue Queue Energy Reduction.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2011

Implementing End-to-End Register Data-Flow Continuous Self-Test.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2011

TRAMS Project: Variability and Reliability of SRAM Memories in sub-22 nm Bulk-CMOS Technologies.

[BibT_eX]

[DOI]

Proceedings of the 2nd European Future Technologies Conference and Exhibition, 2011

Design of complex circuits using the Via-Configurable transistor array regular layout fabric.

[BibT_eX]

[DOI]

Proceedings of the IEEE 24th International SoC Conference, SOCC 2011, Taipei, Taiwan, 2011

A Power-Efficient Co-designed Out-of-Order Processor.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on Computer Architecture and High Performance Computing, 2011

Accelerating microprocessor silicon validation by exposing ISA diversity.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Global productiveness propagation: a code optimization technique to speculatively prune useless narrow computations.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN/SIGBED 2011 conference on Languages, 2011

Thread shuffling: combining DVFS and thread migration toreduce energy consumptions for multi-core systems.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Symposium on Low Power Electronics and Design, 2011

A Performance and Area Efficient Architecture for Intrusion Detection Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

New reliability mechanisms in memory design for sub-22nm technologies.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International On-Line Testing Symposium (IOLTS 2011), 2011

Dynamic fine-grain body biasing of caches with latency and leakage 3T1D-based monitors.

[BibT_eX]

[DOI]

Proceedings of the IEEE 29th International Conference on Computer Design, 2011

Fg-STP: Fine-Grain Single Thread Partitioning on Multicores.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Hardware/software-based diagnosis of load-store queues using expandable activity logs.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Moore's law implications on energy reduction.

[BibT_eX]

[DOI]

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

Implementing a hybrid SRAM / eDRAM NUCA architecture.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on High Performance Computing, 2011

SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion.

[BibT_eX]

[DOI]

Proceedings of the 8th Conference on Computing Frontiers, 2011

Beforehand Migration on D-NUCA Caches.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011

A Co-designed HW/SW Approach to General Purpose Program Acceleration Using a Programmable Functional Unit.

[BibT_eX]

[DOI]

Proceedings of the 15th Workshop on Interaction between Compilers and Computer Architectures, 2011

2010

Processor Microarchitecture: An Implementation Perspective

[BibT_eX]

[DOI]

Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, ISBN: 978-3-031-01729-2, 2010

Leveraging Register Windows to Reduce Physical Registers to the Bare Minimum.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2010

Thread-management techniques to maximize efficiency in multicore and simultaneous multithreaded microprocessors.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2010

Energy efficiency via thread fusion and value reuse.

[BibT_eX]

[DOI]

IET Comput. Digit. Tech., 2010

VCTA: A Via-Configurable Transistor Array regular fabric.

[BibT_eX]

[DOI]

Proceedings of the 18th IEEE/IFIP VLSI-SoC 2010, 2010

A Dynamically Adaptable Hardware Transactional Memory.

[BibT_eX]

[DOI]

Marc Lupon

Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

MT-SBST: Self-test optimization in multithreaded multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Test Conference, 2010

MODEST: a model for energy estimation under spatio-temporal variability.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Symposium on Low Power Electronics and Design, 2010

The auction: optimizing banks usage in Non-Uniform Cache Architectures.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Supercomputing, 2010

High-Performance low-vcc in-order core.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

Circuit propagation delay estimation through multivariate regression-based modeling under spatio-temporal variability.

[BibT_eX]

[DOI]

Proceedings of the Design, Automation and Test in Europe, 2010

2009

Selective replication: A lightweight technique for soft errors.

[BibT_eX]

[DOI]

ACM Trans. Comput. Syst., 2009

Reducing Soft Errors through Operand Width Aware Policies.

[BibT_eX]

[DOI]

IEEE Trans. Dependable Secur. Comput., 2009

AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2009

Energy-efficient register caching with compiler assistance.

[BibT_eX]

[DOI]

Oguz Ergin

ACM Trans. Archit. Code Optim., 2009

Exploring the limits of early register release: Exploiting compiler analysis.

[BibT_eX]

[DOI]

Oguz Ergin

ACM Trans. Archit. Code Optim., 2009

Low Vccmin fault-tolerant cache with highly predictable performance.

[BibT_eX]

[DOI]

Proceedings of the 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), 2009

Boosting single-thread performance in multi-core systems through fine-grain multi-threading.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

End-to-end register data-flow continuous self-test.

[BibT_eX]

[DOI]

Proceedings of the 36th International Symposium on Computer Architecture (ISCA 2009), 2009

Online error detection and correction of erratic bits in register files.

[BibT_eX]

[DOI]

Proceedings of the 15th IEEE International On-Line Testing Symposium (IOLTS 2009), 2009

Using Coherence Information and Decay Techniques to Optimize L2 Cache Leakage in CMPs.

[BibT_eX]

[DOI]

Matteo Monchiero

Proceedings of the ICPP 2009, 2009

LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Computer Design, 2009

P-slice based efficient speculative multithreading.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on High Performance Computing, 2009

Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Key Microarchitectural Innovations for Future Microprocessors.

[BibT_eX]

[DOI]

Proceedings of the Architecture of Computing Systems, 2009

Anaphase: A Fine-Grain Thread Decomposition Scheme for Speculative Multithreading.

[BibT_eX]

[DOI]

Proceedings of the PACT 2009, 2009

FASTM: A Log-based Hardware Transactional Memory with Fast Abort Recovery.

[BibT_eX]

[DOI]

Marc Lupon

Proceedings of the PACT 2009, 2009

2008

Power/Performance/Thermal Design-Space Exploration for Multicore Architectures.

[BibT_eX]

[DOI]

Matteo Monchiero

IEEE Trans. Parallel Distributed Syst., 2008

Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices.

[BibT_eX]

[DOI]

Carlos Madriles

Carlos García Quiñones

IEEE Trans. Parallel Distributed Syst., 2008

Refueling: Preventing Wire Degradation due to Electromigration.

[BibT_eX]

[DOI]

IEEE Micro, 2008

Version management alternatives for hardware transactional memory.

[BibT_eX]

[DOI]

Marc Lupon

Proceedings of the 9th workshop on MEmory performance, 2008

Thread fusion.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Symposium on Low Power Electronics and Design, 2008

Efficient resources assignment schemes for clustered multithreaded processors.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A software-hardware hybrid steering mechanism for clustered microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

On-Line Failure Detection and Confinement in Caches.

[BibT_eX]

[DOI]

Proceedings of the 14th IEEE International On-Line Testing Symposium (IOLTS 2008), 2008

Meeting points: using thread criticality to adapt multicore hardware to parallel regions.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, 2008

2007

Understanding the Thermal Implications of Multi-Core Architectures.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2007

Guest Editors' Introduction: Micro's Top Picks from the Microarchitecture Conferences.

[BibT_eX]

[DOI]

Ronny Ronen

IEEE Micro, 2007

Reliability: Fallacy or Reality?

[BibT_eX]

[DOI]

IEEE Micro, 2007

Penelope: The NBTI-Aware Processor.

[BibT_eX]

[DOI]

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40 2007), 2007

Building a large instruction window through ROB compression.

[BibT_eX]

[DOI]

Proceedings of the 2007 workshop on MEmory performance, 2007

Fuse: A Technique to Anticipate Failures due to Degradation in ALUs.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International On-Line Testing Symposium (IOLTS 2007), 2007

Improving Branch Prediction and Predicated Execution in Out-of-Order Processors.

[BibT_eX]

[DOI]

Proceedings of the 13st International Conference on High-Performance Computer Architecture (HPCA-13 2007), 2007

Virtual Cluster Scheduling Through the Scheduling Graph.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Heterogeneous Clustered VLIW Microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Early Register Release for Out-of-Order Processors with RegisterWindows.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

2006

Control Speculation for Energy-Efficient Next-Generation Superscalar Processors.

[BibT_eX]

[DOI]

Juan L. Aragón

José M. González

IEEE Trans. Computers, 2006

Impact of Parameter Variations on Circuits and Microarchitecture.

[BibT_eX]

[DOI]

IEEE Micro, 2006

A dynamically reconfigurable cache for multithreaded processors.

[BibT_eX]

[DOI]

J. Embed. Comput., 2006

Instruction scheduling for a clustered VLIW processor with a word-interleaved cache.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2006

Exploiting Narrow Values for Soft Error Tolerance.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2006

Independent front-end and back-end dynamic voltage scaling for a GALS microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Symposium on Low Power Electronics and Design, 2006

Empowering a helper cluster through data-width aware instruction selection policies.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

SAMIE-LSQ: set-associative multiple-instruction entry load/store queue.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Selective predicate prediction for out-of-order processors.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Design space exploration for multicore architectures: a power/performance/thermal view.

[BibT_eX]

[DOI]

Matteo Monchiero

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Heterogeneous way-size cache.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Conference on Supercomputing, 2006

2005

On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures.

[BibT_eX]

[DOI]

Julio Sahuquillo

José Duato

IEEE Trans. Parallel Distributed Syst., 2005

An accurate cost model for guiding data locality transformations.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2005

Distributed Data Cache Designs for Clustered VLIW Processors.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2005

IATAC: a smart predictor to turn-off L2 cache lines.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2005

Speculative execution for hiding memory latency.

[BibT_eX]

[DOI]

Alex Pajuelo

SIGARCH Comput. Archit. News, 2005

Hardware support for early register release.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Netw., 2005

Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices.

[BibT_eX]

[DOI]

Carlos García Quiñones

Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005

Demystifying on-the-fly spill code.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005

The Mitosis Speculative Multithreaded Architectures.

[BibT_eX]

Carlos Madriles

Carlos García Quiñones

Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Reducing Misspeculation Penalty in Trace-Level Speculative Multithreaded Architectures.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing - 6th International Symposium, 2005

Control-Flow Independence Reuse via Dynamic Vectorization.

[BibT_eX]

[DOI]

Alex Pajuelo

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Inherently Workload-Balanced Clustered Microarchitecture.

[BibT_eX]

[DOI]

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Memory Bank Predictors.

[BibT_eX]

[DOI]

Stefan Bieschewski

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Software Directed Issue Queue Power Reduction.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Distributing the Frontend for Temperature Reduction.

[BibT_eX]

[DOI]

Proceedings of the 11th International Conference on High-Performance Computer Architecture (HPCA-11 2005), 2005

Value Compression for Efficient Computation.

[BibT_eX]

[DOI]

James E. Smith

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Compiler Directed Early Register Release.

[BibT_eX]

[DOI]

Oguz Ergin

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Variable-Based Multi-module Data Caches for Clustered VLIW Processors.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

Compiler analysis for trace-level speculative multithreaded architectures.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures, 2005

2004

A fast and accurate framework to analyze and optimize cache memory behavior.

[BibT_eX]

[DOI]

ACM Trans. Program. Lang. Syst., 2004

Late Allocation and Early Release of Physical Registers.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2004

Thread Partitioning and Value Prediction for Exploiting Speculative Thread-Level Parallelism.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2004

Removing communications in clustered microarchitectures through instruction replication.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2004

Cache organizations for clustered microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 3rd Workshop on Memory Performance Issues, 2004

Back-end assignment schemes for clustered multithreaded processors.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual International Conference on Supercomputing, 2004

Frontend Frequency-Voltage Adaptation for Optimal Energy-Delay^2.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Thermal-Aware Clustered Microarchitectures.

[BibT_eX]

[DOI]

Pedro Chaparro

Proceedings of the 22nd IEEE International Conference on Computer Design: VLSI in Computers & Processors (ICCD 2004), 2004

Low-Complexity Distributed Issue Queue.

[BibT_eX]

[DOI]

Proceedings of the 10th International Conference on High-Performance Computer Architecture (HPCA-10 2004), 2004

Software-Controlled Operand-Gating.

[BibT_eX]

[DOI]

James E. Smith

Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2003

Power- and Complexity-Aware Issue Queue Designs.

[BibT_eX]

[DOI]

IEEE Micro, 2003

A framework for modeling and optimization of prescient instruction prefetch.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Instruction Replication for Clustered Microarchitectures.

[BibT_eX]

[DOI]

Proceedings of the 36th Annual International Symposium on Microarchitecture, 2003

Non redundant data cache.

[BibT_eX]

[DOI]

Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003

Dynamic Cluster Resizing.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

On Reducing Register Pressure and Energy in Multiple-Banked Register Files.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Power Efficient Data Cache Designs.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on Computer Design (ICCD 2003), 2003

Power-Aware Control Speculation through Selective Throttling.

[BibT_eX]

[DOI]

Juan L. Aragón

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

Power-Aware Adaptive Issue Queue and Register File.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

Value Compression to Reduce Power in Data Caches.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2003. Parallel Processing, 2003

Local Scheduling Techniques for Memory Coherence in a Clustered VLIW Processor with a Distributed Data Cache.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), 2003

Optimizing Program Locality Through CMEs and GAs.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques (PACT 2003), 27 September, 2003

2002

Hypercube Algorithms on Mesh Connected Multicomputers.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2002

Errata on "Measuring Experimental Error in Microprocessor Simulation".

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2002

Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor.

[BibT_eX]

[DOI]

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Near-Optimal Padding for Removing Conflict Misses.

[BibT_eX]

[DOI]

Josep Llosa

Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

Speculative Dynamic Vectorization.

[BibT_eX]

[DOI]

Alex Pajuelo

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

An interleaved cache clustered VLIW processor.

[BibT_eX]

[DOI]

Proceedings of the 16th international conference on Supercomputing, 2002

A comparative study of modulo scheduling techniques.

[BibT_eX]

[DOI]

Josep Llosa

Proceedings of the 16th international conference on Supercomputing, 2002

Dual path instruction processing.

[BibT_eX]

[DOI]

Proceedings of the 16th international conference on Supercomputing, 2002

Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002

Hardware Schemes for Early Register Release.

[BibT_eX]

[DOI]

Proceedings of the 31st International Conference on Parallel Processing (ICPP 2002), 2002

Trace-Level Speculative Multithreaded Architecture.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Computer Design (ICCD 2002), 2002

Thread-Spawning Schemes for Speculative Multithreading.

[BibT_eX]

[DOI]

Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA'02), 2002

Efficient Interconnects for Clustered Microarchitectures.

[BibT_eX]

[DOI]

Julio Sahuquillo

José Duato

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning.

[BibT_eX]

[DOI]

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques (PACT 2002), 2002

2001

Improving Latency Tolerance of Multithreading through Decoupling.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

Lifetime-Sensitive Modulo Scheduling in a Production Environment.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

Control-Flow Speculation through Value Prediction.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2001

Implementing the one-sided Jacobi method on a 2D/3D mesh multicomputer.

[BibT_eX]

[DOI]

Parallel Comput., 2001

Clustered Modulo Scheduling in a VLIW Architecture with Distributed Cache .

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2001

Dynamic Code Partitioning for Clustered Architectures.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2001

CALMANT: A Systematic Method for the Execution of Hypercube Algorithms in Multiprocessor Systems.

[BibT_eX]

[DOI]

Computación y Sistemas, 2001

CALMANT: Un Método Sistemático para la Ejecución de Algoritmos Hipercubo en Sistemas Multiprocesador.

[BibT_eX]

[DOI]

Computación y Sistemas, 2001

Graph-partitioning based instruction scheduling for clustered processors.

[BibT_eX]

[DOI]

Proceedings of the 34th Annual International Symposium on Microarchitecture, 2001

Energy-effective issue logic.

[BibT_eX]

[DOI]

Daniele Folegnani

Proceedings of the 28th Annual International Symposium on Computer Architecture, 2001

Reducing the complexity of the issue logic.

[BibT_eX]

[DOI]

Proceedings of the 15th international conference on Supercomputing, 2001

Selective Branch Prediction Reversal By Correlating with Data Values and Control Flow.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Computer Design (ICCD 2001), 2001

Confidence Estimation for Branch Prediction Reversal.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors.

[BibT_eX]

[DOI]

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques (PACT 2001), 2001

2000

Optimizing cache miss equations polyhedra.

[BibT_eX]

[DOI]

SIGARCH Comput. Archit. News, 2000

Analyzing Data Locality in Numeric Applications.

[BibT_eX]

[DOI]

IEEE Micro, 2000

Dynamic Register Renaming Through Virtual-Physical Registers.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2000

Modulo scheduling for a fully-distributed clustered VLIW architecture.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Reducing wire delay penalty through value prediction.

[BibT_eX]

[DOI]

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Very low power pipelines using significance compression.

[BibT_eX]

[DOI]

James E. Smith

Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000

Instruction Scheduling for Clustered VLIW Architectures.

[BibT_eX]

[DOI]

Proceedings of the 13th International Symposium on System Synthesis, 2000

An efficient solver for Cache Miss Equations.

[BibT_eX]

[DOI]

Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

Multiple-banked register file architectures.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on Computer Architecture (ISCA 2000), 2000

A Quantitative Assessment of Thread-Level Speculation Techniques.

[BibT_eX]

[DOI]

Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

A low-complexity issue logic.

[BibT_eX]

[DOI]

Proceedings of the 14th international conference on Supercomputing, 2000

The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures.

[BibT_eX]

[DOI]

Proceedings of the 2000 International Conference on Parallel Processing, 2000

Dynamic Cluster Assignment Mechanisms.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Symposium on High-Performance Computer Architecture, 2000

A Fast and Accurate Approach to Analyze Cache Memory Behavior (Research Note).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Complete Exchange Algorithms for Meshes and Tori Using a Systematic Approach (Research Note).

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999

Low Communication Overhead Jacobi Algorithms for Eigenvalues Computation on Hypercubes.

[BibT_eX]

[DOI]

J. Supercomput., 1999

Randomized Cache Placement for Eliminating Conflicts.

[BibT_eX]

[DOI]

Nigel P. Topham

IEEE Trans. Computers, 1999

Software Data Prefetching for Software Pipelined Loops.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1999

Delaying Physical Register Allocation through Virtual-Physical Registers.

[BibT_eX]

[DOI]

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

Value Prediction for Speculative Multithreaded Architectures.

[BibT_eX]

[DOI]

Proceedings of the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, 1999

A locality sensitive multi-module cache with explicit management.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

Dynamic removal of redundant computations.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

Clustered speculative multithreaded processors.

[BibT_eX]

[DOI]

Proceedings of the 13th international conference on Supercomputing, 1999

Trace-Level Reuse.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing 1999, 1999

Reducing Memory Traffic Via Redundant Store Instructions.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

Exploiting Speculative Thread-Level Parallelism on a SMT Processor.

[BibT_eX]

[DOI]

Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

The Synergy of Multithreading and Access/Execute Decoupling.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Control-Flow Speculation through Value Prediction for Superscalar Processors.

[BibT_eX]

[DOI]

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

A Cost-Effective Clustered Architecture.

[BibT_eX]

[DOI]

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998

Modulo Scheduling with Reduced Register Pressure.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1998

A Method for Exploiting Communication/Computation Overlap in Hypercubes.

[BibT_eX]

[DOI]

Parallel Comput., 1998

Data value speculation in superscalar processors.

[BibT_eX]

[DOI]

Microprocess. Microsystems, 1998

Limits of Instruction Level Parallelism with Data Value Speculation.

[BibT_eX]

[DOI]

Proceedings of the Vector and Parallel Processing, 1998

A Jacobi-based algorithm for computing symmetric eigenvalues and eigenvectors in a two-dimensional mesh.

[BibT_eX]

[DOI]

Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing, 1998

Jacobi Orderings for Multi-Port Hypercubes.

[BibT_eX]

[DOI]

Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

Speculative Multithreaded Processors.

[BibT_eX]

[DOI]

Proceedings of the 12th international conference on Supercomputing, 1998

The Potential of Data Value Speculation to Boost ILP.

[BibT_eX]

[DOI]

Proceedings of the 12th international conference on Supercomputing, 1998

Control Speculation in Multithreaded Processors through Dynamic Loop Detection.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Virtual-Physical Registers.

[BibT_eX]

[DOI]

Proceedings of the Fourth International Symposium on High-Performance Computer Architecture, Las Vegas, Nevada, USA, January 31, 1998

Software Prefetching for Software Pipelined Loops.

[BibT_eX]

[DOI]

Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998

Divide-and-Conquer Algorithms on Two-Dimensional Meshes.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '98 Parallel Processing, 1998

The Latency Hiding Effectiveness of Decoupled Access/Execute Processors.

[BibT_eX]

[DOI]

Proceedings of the 24th EUROMICRO '98 Conference, 1998

Data Speculative Multithreaded Architecture.

[BibT_eX]

[DOI]

Proceedings of the 24th EUROMICRO '98 Conference, 1998

Fast, Accurate and Flexible Data Locality Analysis.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997

The Design and Performance of a Conflict-Avoiding Cache.

[BibT_eX]

[DOI]

Nigel P. Topham

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Cache Sensitive Modulo Scheduling.

[BibT_eX]

[DOI]

Proceedings of the Thirtieth Annual IEEE/ACM International Symposium on Microarchitecture, 1997

Eliminating Cache Conflict Misses through XOR-Based Placement Functions.

[BibT_eX]

[DOI]

Nigel P. Topham

Proceedings of the 11th international conference on Supercomputing, 1997

Speculative Execution via Address Prediction and Data Prefetching.

[BibT_eX]

[DOI]

Proceedings of the 11th international conference on Supercomputing, 1997

Virtual registers.

[BibT_eX]

[DOI]

Proceedings of the Fourth International on High-Performance Computing, 1997

PARSAR: Parallelisation of a Chirp Scaling Algorithm SAR Processor.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

Memory Address Prediction for Data Speculation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

A Methodology for User-Oriented Scalability Analysis.

[BibT_eX]

[DOI]

Proceedings of the 1997 International Conference on Application-Specific Systems, 1997

Static Locality Analysis for Cache Management.

[BibT_eX]

[DOI]

Proceedings of the 1997 Conference on Parallel Architectures and Compilation Techniques (PACT '97), 1997

1996

Communication Pipelining in Hypercubes.

[BibT_eX]

[DOI]

Parallel Process. Lett., 1996

The Multipath Architecture for Prolog Programs.

[BibT_eX]

[DOI]

E. Elias

Comput. J., 1996

Overlapping Communication and Computation in Hypercubes.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '96 Parallel Processing, 1996

Swing module scheduling: a lifetime-sensitive approach.

[BibT_eX]

[DOI]

Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995

Executing Algorithms with Hypercube Topology on Torus Multicomputers.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1995

Exploiting path parallelism in logic programming.

[BibT_eX]

[DOI]

Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing (PDP '95), 1995

Load Balancing in a Network Flow Optimization Code.

[BibT_eX]

[DOI]

Enric Fontdecaba

Jesús Labarta

Proceedings of the Applied Parallel Computing, 1995

Hypernode reduction modulo scheduling.

[BibT_eX]

[DOI]

Proceedings of the 28th Annual International Symposium on Microarchitecture, Ann Arbor, Michigan, USA, November 29, 1995

A Data Cache with Multiple Caching Strategies Tuned to Different Types of Locality.

[BibT_eX]

[DOI]

Carlos Aliagas

Proceedings of the 9th international conference on Supercomputing, 1995

1994

Design and Evaluation of an Instruction Cache for Reducing the Cost of Branches.

[BibT_eX]

[DOI]

Perform. Evaluation, 1994

Parallel Numerical Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Second Euromicro Workshop on Parallel and Distributed Processing, 1994

The Multipath Parallel Execution Model for Prolog.

[BibT_eX]

Proceedings of the First International Symposium on Parallel Symbolic Computation, 1994

A Partial Breadth-First Execution Model for Prolog.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Tools with Artificial Intelligence, 1994

Combining depth-first and breadth-first search in Prolog execution.

[BibT_eX]

Proceedings of the 1994 Joint Conference on Declarative Programming, 1994

1993

Reducing Branch Delay to Zero in Pipelined Processors.

[BibT_eX]

[DOI]

José M. Llabería

IEEE Trans. Computers, 1993

Chairmen's introduction.

[BibT_eX]

[DOI]

Jordi Cortadella

Microprocess. Microprogramming, 1993

MEM: A new execution model for Prolog.

[BibT_eX]

[DOI]

Microprocess. Microprogramming, 1993

A survey of branch techniques in pipelined processors.

[BibT_eX]

[DOI]

Microprocess. Microprogramming, 1993

The Xor embedding: An embedding of hypercubes onto rings and toruses.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Application-Specific Array Processors, 1993

1989

Instruction fetch unit for parallel execution of branch instructions.

[BibT_eX]

[DOI]

José M. Llabería

Proceedings of the 3rd international conference on Supercomputing, 1989

1988

A mechanism for reducing the cost of branches in RISC architectures.

[BibT_eX]

[DOI]