Fabrice Rastello

Orcid: 0000-0002-6589-9956

Affiliations:
  • INRIA, France


According to our database1, Fabrice Rastello authored at least 86 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Performance bottlenecks detection through microarchitectural sensitivity.
CoRR, 2024

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers.
CoRR, 2024

Tightening I/O Lower Bounds through the Hourglass Dependency Pattern.
Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures, 2024

EasyTracker: A Python Library for Controlling and Inspecting Program Execution.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2024

2023
Autotuning Convolutions Is Easier Than You Think.
ACM Trans. Archit. Code Optim., June, 2023

2022
PALMED: Throughput Characterization for Superscalar Architectures.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

Graphs and Gating Functions.
Proceedings of the SSA-based Compiler Design, 2022

Standard Construction and Destruction Algorithms.
Proceedings of the SSA-based Compiler Design, 2022

Introduction.
Proceedings of the SSA-based Compiler Design, 2022

Introduction.
Proceedings of the SSA-based Compiler Design, 2022

SSA Destruction for Machine Code.
Proceedings of the SSA-based Compiler Design, 2022

Static Single Information Form.
Proceedings of the SSA-based Compiler Design, 2022

Properties and Flavours.
Proceedings of the SSA-based Compiler Design, 2022

Register Allocation.
Proceedings of the SSA-based Compiler Design, 2022

Liveness.
Proceedings of the SSA-based Compiler Design, 2022

2021
IOOpt: automatic derivation of I/O complexity bounds for affine programs.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021

Do Common Educational Datasets contain Static Information? A Statistical Study.
Proceedings of the 14th International Conference on Educational Data Mining, 2021

PolyBench/Python: benchmarking Python environments with polyhedral optimizations.
Proceedings of the CC '21: 30th ACM SIGPLAN International Conference on Compiler Construction, 2021

2020
Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable.
ACM Trans. Archit. Code Optim., 2020

From micro-OPs to abstract resources: constructing a simpler CPU performance model through microbenchmarking.
CoRR, 2020

Efficient tiled sparse matrix multiplication through matrix signatures.
Proceedings of the International Conference for High Performance Computing, 2020

Automated derivation of parametric data movement lower bounds for affine programs.
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020

2019
Analytical cache modeling and tilesize optimization for tensor contractions.
Proceedings of the International Conference for High Performance Computing, 2019

Data-flow/dependence profiling for structured transformations.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

2018
Associative instruction reordering to alleviate register pressure.
Proceedings of the International Conference for High Performance Computing, 2018

Register optimizations for stencils on GPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Performance modeling for GPUs using abstract kernel emulation.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

GPU code optimization using abstract kernel emulation and sensitivity analysis.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

2017
Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Simplification and runtime resolution of data dependence constraints for loop transformations.
Proceedings of the International Conference on Supercomputing, 2017

POSTER: Statement Reordering to Alleviate Register Pressure for Stencils on GPUs.
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017

2016
Static and Dynamic Frequency Scaling on Multicore CPUs.
ACM Trans. Archit. Code Optim., 2016

Brief Announcement: Approximating the I/O Complexity of One-Shot Red-Blue Pebbling.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment.
Proceedings of the International Conference for High Performance Computing, 2016

An interval constrained memory allocator for the Givy GAS runtime.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

PolyCheck: dynamic verification of iteration space transformations on affine programs.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

Effective padding of multidimensional arrays to avoid cache conflict misses.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Generalized cache tiling for dataflow programs.
Proceedings of the 17th ACM SIGPLAN/SIGBED Conference on Languages, 2016

Description, Implementation and Evaluation of an Affinity Clause for Task Directives.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

A bounded memory allocator for software-defined global address spaces.
Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, Santa Barbara, CA, USA, June 14, 2016

Using Data Dependencies to Improve Task-Based Scheduling Strategies on NUMA Architectures.
Proceedings of the Euro-Par 2016: Parallel Processing, 2016

On fusing recursive traversals of K-d trees.
Proceedings of the 25th International Conference on Compiler Construction, 2016

Register allocation and promotion through combined instruction scheduling and loop unrolling.
Proceedings of the 25th International Conference on Compiler Construction, 2016

POSTER: Hybrid Data Dependence Analysis for Loop Transformations.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
On Characterizing the Data Access Complexity of Programs.
Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2015

Runtime pointer disambiguation.
Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, 2015

2014
On Using the Roofline Model with Lower Bounds on Data Movement.
ACM Trans. Archit. Code Optim., 2014

A Tiling Perspective for Register Optimization.
CoRR, 2014

On characterizing the data movement complexity of computational DAGs for parallel execution.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

A framework for enhancing data reuse via associative reordering.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Parameterized Construction of Program Representations for Sparse Dataflow Analyses.
Proceedings of the Compiler Construction - 23rd International Conference, 2014

2013
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential.
ACM Trans. Archit. Code Optim., 2013

A polynomial spilling heuristic: Layered allocation.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

2012
SSI Properties Revisited.
ACM Trans. Embed. Comput. Syst., 2012

2011
Decoupled graph-coloring register allocation with hierarchical aliasing.
Proceedings of the 14th International Workshop on Software and Compilers for Embedded Systems, 2011

Graph-coloring and treescan register allocation using repairing.
Proceedings of the 14th International Conference on Compilers, 2011

A Non-iterative Data-Flow Algorithm for Computing Liveness Sets in Strict SSA Programs.
Proceedings of the Programming Languages and Systems - 9th Asian Symposium, 2011

2010
Parallel copy motion.
Proceedings of the 13th International Workshop on Software and Compilers for Embedded Systems, 2010

Split Register Allocation: Linear Complexity Without the Performance Penalty.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

2009
Revisiting Out-of-SSA Translation for Correctness, Code Quality and Efficiency.
Proceedings of the CGO 2009, 2009

2008
Fast liveness checking for ssa-form programs.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

Advanced conservative and optimistic register coalescing.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
On the complexity of spill everywhere under SSA form.
Proceedings of the 2007 ACM SIGPLAN/SIGBED Conference on Languages, 2007

On the Complexity of Register Coalescing.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

2006
Register Allocation: What Does the NP-Completeness Proof of Chaitin et al. Really Prove? Or Revisiting Register Allocation: Why and How.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

2005
Procedure placement using temporal-ordering information: Dealing with code size expansion.
J. Embed. Comput., 2005

2004
Optimizing Translation Out of SSA Using Renaming Constraints.
Proceedings of the 2nd IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004

2003
Optimal task scheduling at run time to exploit intra-tile parallelism.
Parallel Comput., 2003

2002
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles.
IEEE Trans. Parallel Distributed Syst., 2002

Dense linear algebra kernels on heterogeneous platforms: Redistribution issues.
Parallel Comput., 2002

Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms.
Algorithmica, 2002

Efficient Tiling for an ODE Discrete Integration Program: Redundant Tasks Instead of Trapezoidal Shaped-Tiles.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

2001
Matrix Multiplication on Heterogeneous Platforms.
IEEE Trans. Parallel Distributed Syst., 2001

A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers).
IEEE Trans. Computers, 2001

Alignment and Distribution Is Not (Always) NP-Hard.
J. Parallel Distributed Comput., 2001

Static LU Decomposition on Heterogeneous Platforms.
Int. J. High Perform. Comput. Appl., 2001

Heterogeneous Matrix-Matrix Multiplication or Partitioning a Square into Rectangles: NP-Completeness and Approximation Algorithms.
Proceedings of the Ninth Euromicro Workshop on Parallel and Distributed Processing, 2001

2000
Load Balancing Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-Dimensional Grids.
Proceedings of the 14th International Parallel & Distributed Processing Symposium (IPDPS'00), 2000

Matrix-Matrix Multiplication on Heterogeneous Platforms.
Proceedings of the 2000 International Conference on Parallel Processing, 2000

Heterogeneity Considered Harmful to Algorithm Designers.
Proceedings of the 2000 IEEE International Conference on Cluster Computing (CLUSTER 2000), November 28th, 2000

1999
Algorithmic Issues on Heterogeneous Computing Platforms.
Parallel Process. Lett., 1999

PVM Implementation of Heterogeneous ScaLAPACK Dense Linear Solvers.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 1999

Algorithmic Issues for (Distributed) Hetergeneous Computing Platforms.
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers).
Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, 1999

1998
Determining the Idle Time of a Tiling: New Results.
J. Inf. Sci. Eng., 1998

Optimal Task Scheduling to Minimize Inter-Tile Latencies.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998


  Loading...