André Seznec

Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015

An empirical high level performance model for future many-cores.

[BibT_eX]

[DOI]

Surya Narayanan

Proceedings of the 12th ACM International Conference on Computing Frontiers, 2015

2014

Efficient Out-of-Order Execution of Guarded ISAs.

[BibT_eX]

[DOI]

Nathanaël Prémillieu

ACM Trans. Archit. Code Optim., 2014

Hardware/Software Helper Thread Prefetching on Heterogeneous Many Cores.

[BibT_eX]

[DOI]

Alain Ketterlin

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing, 2014

Impact of Serial Scaling of Multi-threaded Programs in Many-Core Era.

[BibT_eX]

[DOI]

Surya Narayanan

Proceedings of the 26th IEEE International Symposium on Computer Architecture and High Performance Computing Workshop, 2014

Skewed Compressed Caches.

[BibT_eX]

[DOI]

Somayeh Sardashti

David A. Wood

Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

EOLE: Paving the way for an effective implementation of value prediction.

[BibT_eX]

[DOI]

Arthur Perais

Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture, 2014

Practical data value speculation for future high-end processors.

[BibT_eX]

[DOI]

Arthur Perais

Proceedings of the 20th IEEE International Symposium on High Performance Computer Architecture, 2014

2013

Faster unicores are still needed.

[BibT_eX]

[DOI]

Proceedings of the 2013 International Conference on Embedded Computer Systems: Architectures, 2013

Selecting benchmark combinations for the evaluation of multicore throughput.

[BibT_eX]

[DOI]

Ricardo A. Velásquez

Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, 2013

Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs.

[BibT_eX]

[DOI]

Junjie Lai

Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Message from the program chairs.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012

SYRANT: SYmmetric resource allocation on not-taken and taken paths.

[BibT_eX]

[DOI]

Nathanaël Prémillieu

ACM Trans. Archit. Code Optim., 2012

BADCO: Behavioral Application-Dependent Superscalar Core model.

[BibT_eX]

[DOI]

Ricardo A. Velásquez

Proceedings of the 2012 International Conference on Embedded Computer Systems: Architectures, 2012

PRETI: partitioned real-time shared cache for mixed-criticality real-time systems.

[BibT_eX]

[DOI]

Benjamin Lesage

Isabelle Puaut

Proceedings of the 20th International Conference on Real-Time and Network Systems, 2012

Break down GPU execution time with an analytical method.

[BibT_eX]

[DOI]

Junjie Lai

Proceedings of the 2012 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, 2012

2011

Branch Predictors.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Managing SMT resource usage through speculative instruction window weighting.

[BibT_eX]

[DOI]

ACM Trans. Archit. Code Optim., 2011

Fairness Metrics for Multi-Threaded Processors.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2011

A new case for the TAGE branch predictor.

[BibT_eX]

[DOI]

Proceedings of the 44rd Annual IEEE/ACM International Symposium on Microarchitecture, 2011

Storage free confidence estimation for the TAGE branch predictor.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Practical and secure PCM systems by online detection of malicious write streams.

[BibT_eX]

[DOI]

Proceedings of the 17th International Conference on High-Performance Computer Architecture (HPCA-17 2011), 2011

Decoupled zero-compressed memory.

[BibT_eX]

[DOI]

Julien Dusser

Proceedings of the High Performance Embedded Architectures and Compilers, 2011

2010

A Phase Change Memory as a Secure Main Memory.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2010

Proposition for a sequential accelerator in future general-purpose manycore processors and the problem of migration-induced cache misses.

[BibT_eX]

[DOI]

Yiannakis Sazeides

Proceedings of the 7th Conference on Computing Frontiers, 2010

2009

Fetch Gating Control through Speculative Instruction Window Weighting.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2009

Parallel HAVEGE.

[BibT_eX]

[DOI]

Proceedings of the Parallel Processing and Applied Mathematics, 2009

Zero-content augmented caches.

[BibT_eX]

[DOI]

Julien Dusser

Thomas Piquet

Proceedings of the 23rd international conference on Supercomputing, 2009

2008

Speculative return address stack management revisited.

[BibT_eX]

[DOI]

Dionisios N. Pnevmatikatos

ACM Trans. Archit. Code Optim., 2008

2007

High-Performance Embedded Architecture and Compilation Roadmap.

[BibT_eX]

[DOI]

Michael F. P. O'Boyle

Trans. High Perform. Embed. Archit. Compil., 2007

A study of thread migration in temperature-constrained multicores.

[BibT_eX]

[DOI]

Theofanis Constantinou

ACM Trans. Archit. Code Optim., 2007

The Idealistic GTL Predictor.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2007

The L-TAGE Branch Predictor.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2007

Exploiting Single-Usage for Effective Memory Management.

[BibT_eX]

[DOI]

Thomas Piquet

Olivier Rochecouste

Proceedings of the Advances in Computer Systems Architecture, 2007

2006

A case for a complexity-effective, width-partitioned microarchitecture.

[BibT_eX]

[DOI]

Olivier Rochecouste

Gilles Pokam

ACM Trans. Archit. Code Optim., 2006

A case for (partially) TAgged GEometric history length branch prediction.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2006

2005

Conflict-Free Accesses to Strided Vectors on a Banked Cache.

[BibT_eX]

[DOI]

Julio César Hernández Castro

Roger Espasa

IEEE Trans. Computers, 2005

Performance implications of single thread migration on a chip multi-core.

[BibT_eX]

[DOI]

Theofanis Constantinou

SIGARCH Comput. Archit. News, 2005

The strict avalanche criterion randomness test.

[BibT_eX]

[DOI]

Math. Comput. Simul., 2005

Genesis of the O-GEHL Branch Predictor.

[BibT_eX]

[DOI]

J. Instr. Level Parallelism, 2005

Analysis of the O-GEometric History Length Branch Predictor.

[BibT_eX]

[DOI]

Proceedings of the 32st International Symposium on Computer Architecture (ISCA 2005), 2005

2004

Concurrent Support of Multiple Page Sizes on a Skewed Associative TLB.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2004

CASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors.

[BibT_eX]

[DOI]

Romain Dolbeau

J. Instr. Level Parallelism, 2004

IATO: A Flexible EPIC Simulation Environment.

[BibT_eX]

[DOI]

Amaury Darsch

Julio César Hernández Castro

Proceedings of the 16th Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2004), 2004

Speculative software management of datapath-width for energy optimization.

[BibT_eX]

[DOI]

Proceedings of the 2004 ACM SIGPLAN/SIGBED Conference on Languages, 2004

The SAC Test: A New Randomness Test, with Some Applications to PRNG Analysis.

[BibT_eX]

[DOI]

José María Sierra

Julio César Hernández Castro

Proceedings of the Computational Science and Its Applications, 2004

Topic 8: Parallel Computer Architecture and Instruction-Level Parallelism.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

On the design of state-of-the-art pseudorandom number generators by means of genetic programming.

[BibT_eX]

[DOI]

Pedro Isasi

Proceedings of the IEEE Congress on Evolutionary Computation, 2004

2003

HAVEGE: A user-level software heuristic for generating empirically strong random numbers.

[BibT_eX]

[DOI]

Nicolas Sendrier

ACM Trans. Model. Comput. Simul., 2003

Effective ahead Pipelining of Instruction Block Address Generation.

[BibT_eX]

[DOI]

Antony Fraboulet

Proceedings of the 30th International Symposium on Computer Architecture (ISCA 2003), 2003

2002

Register write specialization register read specialization: a path to complexity-effective wide-issue superscalar processors.

[BibT_eX]

[DOI]

Eric Toullec

Olivier Rochecouste

Proceedings of the 35th Annual International Symposium on Microarchitecture, 2002

Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

Tarantula: A Vector Extension to the Alpha Architecture.

[BibT_eX]

[DOI]

Proceedings of the 29th International Symposium on Computer Architecture (ISCA 2002), 2002

2001

An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors.

[BibT_eX]

[DOI]

Stéphan Jourdan

Int. J. Parallel Program., 2001

Boosting SMT Performance by Speculation Control.

[BibT_eX]

[DOI]

Kun Luo

Manoj Franklin

Shubhendu S. Mukherjee

Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001

Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors.

[BibT_eX]

[DOI]

Proceedings of the Seventh International Symposium on High-Performance Computer Architecture (HPCA'01), 2001

Topic 08+13: Instruction-Level Parallelism and Computer Architecture.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000

Handling Global Constraints in Compiler Strategy.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2000

Combining Light Static Code Annotation and Instruction-Set Emulation for Flexible and Efficient On-the-Fly Simulation (Research Note).

[BibT_eX]

[DOI]

Thierry Lafage

Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999

Out-of-Order Execution may not be Cost-Effective on Processors Featuring Simultaneous Multithreading.

[BibT_eX]

[DOI]

Sébastien Hily

Proceedings of the Fifth International Symposium on High-Performance Computer Architecture, 1999

Code Cloning Tracing: A "Pay per Trace" Approach.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

OCEANS - Optimising Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Peter M. W. Knijnenburg

Paul van der Mark

Andy Nisbet

Michael F. P. O'Boyle

Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors.

[BibT_eX]

[DOI]

Stéphan Jourdan

Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

1998

OCEANS: Optimising Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Peter M. W. Knijnenburg

Michael F. P. O'Boyle

Proceedings of the Euro-Par '98 Parallel Processing, 1998

Improving Cache Behavior of Dynamically Allocated Data Structures.

[BibT_eX]

[DOI]

D. N. Truong

Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

1997

Decoupled Sectored Caches.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1997

Skewed Associativity Improves Program Performance and Enhances Predictability.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1997

Trading Conflict and Capacity Aliasing in Conditional Branch Predictors.

[BibT_eX]

[DOI]

Richard Uhlig

Proceedings of the 24th International Symposium on Computer Architecture, 1997

OCEANS: Optimizing Compilers for Embedded Applications.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par '97 Parallel Processing, 1997

1996

Don't Use the Page Number, But a Pointer To It.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual International Symposium on Computer Architecture, 1996

Multiple-Block Ahead Branch Predictors.

[BibT_eX]

[DOI]

Proceedings of the ASPLOS-VII Proceedings, 1996

Branch prediction and simultaneous multithreading.

[BibT_eX]

[DOI]

Sébastien Hily

Proceedings of the Fifth International Conference on Parallel Architectures and Compilation Techniques, 1996

1995

About Cache Associativity in Low-Cost Shared Memory Multi-Microprocessors.

[BibT_eX]

[DOI]

Parallel Process. Lett., 1995

Odd Memory Systems: A New Approach.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1995

Skewed Associativity Enhances Performance Predictability.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual International Symposium on Computer Architecture, 1995

DASC Cache.

[BibT_eX]

[DOI]

Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture (HPCA 1995), 1995

Direct-mapped versus set-associative pipelined caches.

[BibT_eX]

[DOI]

Nathalie Drach

Daniel Windheiser

Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques, 1995

1994

Interleaved Parallel Schemes.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 1994

Decoupled Sectored Caches: Conciliating Low Tag Implementation Cost and Low Miss Ratio.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual International Symposium on Computer Architecture. Chicago, 1994

1993

Skewed-associative Caches.

[BibT_eX]

[DOI]

Proceedings of the PARLE '93, 1993

MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines.

[BibT_eX]

[DOI]

Nathalie Drach

Proceedings of the 26th Annual International Symposium on Microarchitecture, 1993

Odd Memory Systems May be Quite Interesting.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

A Case for Two-Way Skewed-Associative Caches.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993

Semi-Unified Caches.

[BibT_eX]

[DOI]

Nathalie Drach

Proceedings of the 1993 International Conference on Parallel Processing, 1993

About Set and Skewed Associativity on Second-Level Caches.

[BibT_eX]

[DOI]

Proceedings of the Proceedings 1993 International Conference on Computer Design: VLSI in Computers & Processors, 1993

1992

Controlling and sequencing a heavily pipelined floating-point operator.

[BibT_eX]

[DOI]

Karl Courtel

Proceedings of the 25th Annual International Symposium on Microarchitecture, 1992

Interleaved Parallel Schemes: Improving Memory Throughput on Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

OPAC: A floating-point coprocessor dedicated to compute-bound kernels.

[BibT_eX]

[DOI]

Karl Courtel

Proceedings of the 19th Annual International Symposium on Computer Architecture. Gold Coast, 1992

1989

A asynchronous buffering network for tightly coupled multiprocessors.

[BibT_eX]

[DOI]

Proceedings of the 3rd international conference on Supercomputing, 1989

1988

Synchronizing Processors Through Memory Requests in a Tightly Coupled Multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual International Symposium on Computer Architecture, 1988

Towards a large number of pipeline processors in a tightly coupled multiprocessor using no cache.

[BibT_eX]

[DOI]

Proceedings of the 2nd international conference on Supercomputing, 1988

1987

A New Interconnection Network for SIMD Computers: The Sigma Network.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 1987

Optimizing Memory Throughput In a Tightly Coupled Multiprocessor.

[BibT_eX]

Proceedings of the International Conference on Parallel Processing, 1987

1986

Data Synchronized Pipeline Architecture: Pipelining in Multiprocessor Environments.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 1986

An Efficient Routing Control Unit for the SIGMA Network E(4).

[BibT_eX]

[DOI]