Albert Cohen

Orcid: 0000-0002-8866-5343

Affiliations:
  • Google, FR
  • INRIA and École Normale Supérieure, Paris, France (former)
  • PRiSM, Université de Versailles, Versailles, France (former)


According to our database1, Albert Cohen authored at least 153 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The MLIR Transform Dialect. Your compiler is more powerful than you think.
CoRR, 2024

The Next 700 ML-Enabled Compiler Optimizations.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

Strided Difference Bound Matrices.
Proceedings of the Computer Aided Verification - 36th International Conference, 2024

2023
Autotuning Convolutions Is Easier Than You Think.
ACM Trans. Archit. Code Optim., June, 2023

Bidirectional Reactive Programming for Machine Learning.
CoRR, 2023

Code Generation for In-Place Stencils.
Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, 2023

RL4ReAl: Reinforcement Learning for Register Allocation.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

2022
Weaving Synchronous Reactions into the Fabric of SSA-form Compilers.
ACM Trans. Archit. Code Optim., 2022

End-to-end translation validation for the halide language.
Proc. ACM Program. Lang., 2022

RL4ReAl: Reinforcement Learning for Register Allocation.
CoRR, 2022

Composable and Modular Code Generation in MLIR: A Structured and Retargetable Approach to Tensor Compiler Construction.
CoRR, 2022

Structured Operations: Modular Design of Code Generators for Tensor Compilers.
Proceedings of the Languages and Compilers for Parallel Computing, 2022

Loop Tree and Induction Variables.
Proceedings of the SSA-based Compiler Design, 2022

2021
Reconciling optimization with secure compilation.
Proc. ACM Program. Lang., 2021

Secure Optimization Through Opaque Observations.
CoRR, 2021

MLIR: Scaling Compiler Infrastructure for Domain Specific Computation.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Seamless Compiler Integration of Variable Precision Floating-Point Arithmetic.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

Progressive Raising in Multi-level IR.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically.
ACM Trans. Archit. Code Optim., 2020

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs.
ACM Trans. Archit. Code Optim., 2020

Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation.
ACM Trans. Archit. Code Optim., 2020

MLIR: A Compiler Infrastructure for the End of Moore's Law.
CoRR, 2020

Secure delivery of program properties through optimizing compilation.
Proceedings of the CC '20: 29th International Conference on Compiler Construction, 2020

VP Float: First Class Treatment for Variable Precision Floating Point Arithmetic.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware.
ACM Trans. Archit. Code Optim., 2019

On the Representation of Partially Specified Implementations and its Application to the Optimization of Linear Algebra Kernels on GPU.
CoRR, 2019

Studying EM Pulse Effects on Superscalar Microarchitectures at ISA Level.
CoRR, 2019

Byte-Aware Floating-point Operations through a UNUM Computing Unit.
Proceedings of the 27th IFIP/IEEE International Conference on Very Large Scale Integration, 2019

Sheep in wolf's Clothing: Implementation Models for Dataflow Multi-Threaded Software.
Proceedings of the 19th International Conference on Application of Concurrency to System Design, 2019

A First ISA-Level Characterization of EM Pulse Effects on Superscalar Microarchitectures: A Secure Software Perspective.
Proceedings of the 14th International Conference on Availability, Reliability and Security, 2019

2018
An Approach for Finding Permutations Quickly: Fusion and Dimension matching.
CoRR, 2018

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions.
CoRR, 2018

Polyhedral auto-transformation with no integer linear programming.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Design and Performance Analysis of Real-Time Dynamic Streaming Applications.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

A Unified Approach to Variable Renaming for Enhanced Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

Meta-programming for cross-domain tensor optimizations.
Proceedings of the 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2018

Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling.
Proceedings of the 27th International Conference on Compiler Construction, 2018

A polyhedral compilation framework for loops with dynamic data-dependent bounds.
Proceedings of the 27th International Conference on Compiler Construction, 2018

2017
From a Formalized Parallel Action Language to Its Efficient Code Generation.
ACM Trans. Embed. Comput. Syst., 2017

Compiler-Assisted Loop Hardening Against Fault Attacks.
ACM Trans. Archit. Code Optim., 2017

Towards compositional and generative tensor optimizations.
Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, 2017

Optimization space pruning without regrets.
Proceedings of the 26th International Conference on Compiler Construction, 2017

2016
The Pluto+ Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests.
ACM Trans. Program. Lang. Syst., 2016

Automatic Storage Optimization for Arrays.
ACM Trans. Program. Lang. Syst., 2016

In-Place Update in a Dataflow Synchronous Language: A Retiming-Enabled Language Experiment.
Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems, 2016

An interval constrained memory allocator for the Givy GAS runtime.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

SMO: an integrated approach to intra-array and inter-array storage optimization.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016

Effective padding of multidimensional arrays to avoid cache conflict misses.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Language-Centric Performance Analysis of OpenMP Programs with Aftermath.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

A bounded memory allocator for software-defined global address spaces.
Proceedings of the 2016 ACM SIGPLAN International Symposium on Memory Management, Santa Barbara, CA, USA, June 14, 2016

Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems.
Proceedings of the 2016 IEEE International Symposium on Performance Analysis of Systems and Software, 2016

Transaction Parameterized Dataflow: A model for context-dependent streaming applications.
Proceedings of the 2016 Design, Automation & Test in Europe Conference & Exhibition, 2016

Reduction Drawing: Language Constructs and Polyhedral Compilation for Reductions on GPU.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Scalable Task Parallelism for NUMA: A Uniform Abstraction for Coordinated Scheduling and Memory Management.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Guest Editorial: Embedded Multicore Systems and Applications.
J. Signal Process. Syst., 2015

Polyhedral AST Generation Is More Than Scanning Polyhedra.
ACM Trans. Program. Lang. Syst., 2015

Managing the Latency of Data-Dependent Tasks in Embedded Streaming Applications.
Proceedings of the IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip, 2015

Streaming Task Parallelism.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

An Empirical Evaluation of a Programming Model for Context-Dependent Real-time Streaming Applications.
Proceedings of the International Conference on Computational Science, 2015

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs.
ACM Trans. Archit. Code Optim., 2014

Topology-Aware and Dependence-Aware Scheduling and Memory Allocation for Task-Parallel Languages.
ACM Trans. Archit. Code Optim., 2014

The Relation Between Diamond Tiling and Hexagonal Tiling.
Parallel Process. Lett., 2014

Improving the design flow for parallel and heterogeneous architectures running real-time applications: The PHARAON FP7 project.
Microprocess. Microsystems, 2014

TERAFLUX: Harnessing dataflow in next generation teradevices.
Microprocess. Microsystems, 2014

Automatic Detection of Performance Anomalies in Task-Parallel Programs.
CoRR, 2014

A parallel action language for embedded applications and its compilation flow.
Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, 2014

Energy-aware parallelization flow and toolset for C code.
Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, 2014

Technology transfer towards Horizon 2020.
Proceedings of the Design, Automation & Test in Europe Conference & Exhibition, 2014

Hybrid Hexagonal/Classical Tiling for GPUs.
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014

Tiling and optimizing time-iterated computations on periodic domains.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Polyhedral parallel code generation for CUDA.
ACM Trans. Archit. Code Optim., 2013

OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs.
ACM Trans. Archit. Code Optim., 2013

A decoupled local memory allocator.
ACM Trans. Archit. Code Optim., 2013

Improved loop tiling based on the removal of spurious false dependences.
ACM Trans. Archit. Code Optim., 2013

Predictive Modeling in a Polyhedral Optimization Space.
Int. J. Parallel Program., 2013

Minimal Unroll Factor for Code Generation of Software Pipelining.
Int. J. Parallel Program., 2013

Correct and Efficient Accelerator Programming (Dagstuhl Seminar 13142).
Dagstuhl Reports, 2013

PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs
CoRR, 2013

Correct and Efficient Bounded FIFO Queues.
Proceedings of the 25th International Symposium on Computer Architecture and High Performance Computing, 2013

Correct and efficient work-stealing for weak memory models.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

Sub-polyhedral scheduling using (unit-)two-variable-per-inequality polyhedra.
Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2013


EU FP7-288307 Pharaon Project: Parallel and Heterogeneous Architecture for Real-Time Applications.
Proceedings of the 2013 Euromicro Conference on Digital System Design, 2013

A polynomial spilling heuristic: Layered allocation.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

Split tiling for GPUs: automatic parallelization using trapezoidal tiles.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012
Automatic Extraction of Coarse-Grained Data-Flow Threads from Imperative Programs.
IEEE Micro, 2012

On the effectiveness of register moves to minimise post-pass unrolling in software pipelined loops.
Proceedings of the 2012 International Conference on High Performance Computing & Simulation, 2012

Programming parallelism with futures in lustre.
Proceedings of the 12th International Conference on Embedded Software, 2012

2011
Coarse-grained loop parallelization: Iteration Space Slicing vs affine transformations.
Parallel Comput., 2011

ACOTES Project: Advanced Compiler Technologies for Embedded Streaming.
Int. J. Parallel Program., 2011

The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization
CoRR, 2011

Transitive Closures of Affine Integer Tuple Relations and Their Overapproximations.
Proceedings of the Static Analysis - 18th International Symposium, 2011

Loop transformations: convexity, pruning and optimization.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

A Mutable Hardware Abstraction to Replace Threads.
Proceedings of the Languages and Compilers for Parallel Computing, 2011

Loop unrolling minimisation in the presence of multiple register types: A viable alternative to modulo variable expansion.
Proceedings of the 2011 International Conference on High Performance Computing & Simulation, 2011

Speculatively vectorized bytecode.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

A stream-computing extension to OpenMP.
Proceedings of the High Performance Embedded Architectures and Compilers, 2011

Predictive modeling in a polyhedral optimization space.
Proceedings of the CGO 2011, 2011

Vapor SIMD: Auto-vectorize once, run everywhere.
Proceedings of the CGO 2011, 2011

2010
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.
Proceedings of the Conference on High Performance Computing Networking, 2010

Split Register Allocation: Linear Complexity Without the Performance Penalty.
Proceedings of the High Performance Embedded Architectures and Compilers, 2010

Processor virtualization and split compilation for heterogeneous multicore embedded systems.
Proceedings of the 47th Design Automation Conference, 2010

ERBIUM: a deterministic, concurrent intermediate representation for portable and scalable performance.
Proceedings of the 7th Conference on Computing Frontiers, 2010

The Polyhedral Model Is More Widely Applicable Than You Think.
Proceedings of the Compiler Construction, 19th International Conference, 2010

Practical aggregation of semantical program properties for machine learning based optimization.
Proceedings of the 2010 International Conference on Compilers, 2010

Erbium: a deterministic, concurrent intermediate representation to map data-flow tasks to scalable, persistent streaming processes.
Proceedings of the 2010 International Conference on Compilers, 2010

2009
Optimizing Local Memory Allocation and Assignment through a Decoupled Approach.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Synchronization-Free Automatic Parallelization: Beyond Affine Iteration-Space Slicing.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Parametric multi-level tiling of imperfectly nested loops.
Proceedings of the 23rd international conference on Supercomputing, 2009

Software Pipelining in Nested Loops with Prolog-Epilog Merging.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Computing the Transitive Closure of a Union of Affine Integer Tuple Relations.
Proceedings of the Combinatorial Optimization and Applications, 2009

Polyhedral-Model Guided Loop-Nest Auto-Vectorization.
Proceedings of the PACT 2009, 2009

2008
Iterative optimization in the polyhedral model: part ii, multidimensional time.
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Post-pass periodic register allocation to minimise loop unrolling degree.
Proceedings of the 2008 ACM SIGPLAN/SIGBED Conference on Languages, 2008

Abstraction of Clocks in Synchronous Data-Flow Systems.
Proceedings of the Programming Languages and Systems, 6th Asian Symposium, 2008

2007
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations.
Trans. High Perform. Embed. Archit. Compil., 2007

Deep Jam: Conversion of Coarse-Grain Parallelism to Fine-Grain and Vector Parallelism.
J. Instr. Level Parallelism, 2007

Code-size conscious pipelining of imperfectly nested loops.
Proceedings of the 2007 workshop on MEmory performance, 2007

07361 Abstracts Collection -- Programming Models for Ubiquitous Parallelism.
Proceedings of the Programming Models for Ubiquitous Parallelism, 02.09. - 07.09.2007, 2007

07361 Introduction -- Programming Models for Ubiquitous Parallelism.
Proceedings of the Programming Models for Ubiquitous Parallelism, 02.09. - 07.09.2007, 2007

Iterative Optimization in the Polyhedral Model: Part I, One-Dimensional Time.
Proceedings of the Fifth International Symposium on Code Generation and Optimization (CGO 2007), 2007

Automatic Correction of Loop Transformations.
Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT 2007), 2007

Contributions to the Design of Reliable and Programmable High-Performance Systems: Principles, Interfaces, Algorithms and Tools. (Contributions à la conception de systèmes à hautes performances, programmables et sûrs: principes, interfaces, algorithmes et outils).
, 2007

2006
In search of a program generator to implement generic transformations for high-performance computing.
Sci. Comput. Program., 2006

Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies.
Int. J. Parallel Program., 2006

Beyond Iteration Vectors: Instancewise Relational Abstract Domains.
Proceedings of the Static Analysis, 13th International Symposium, 2006

<i>N</i>-synchronous Kahn networks: a relaxed model of synchrony for real-time systems.
Proceedings of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2006

Violated dependence analysis.
Proceedings of the 20th Annual International Conference on Supercomputing, 2006

Polyhedral Code Generation in the Real World.
Proceedings of the Compiler Construction, 15th International Conference, 2006

2005
A Language for the Compact Representation of Multiple Program Versions.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

Facilitating the search for compositions of program transformations.
Proceedings of the 19th Annual International Conference on Supercomputing, 2005

Induction Variable Analysis with Delayed Abstractions.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

A Practical Method for Quickly Evaluating Program Optimizations.
Proceedings of the High Performance Embedded Architectures and Compilers, 2005

Topic 4 - Compilers for High Performance.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Synchronization of periodic clocks.
Proceedings of the EMSOFT 2005, 2005

Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications.
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT 2005), 2005

2004
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors.
Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, 2004

Branch Strategies to Optimize Decision Trees for Wide-Issue Architectures.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Applications of storage mapping optimization to register promotion.
Proceedings of the 18th Annual International Conference on Supercomputing, 2004

A Polyhedral Approach to Ease the Composition of Program Transformations.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2003

Putting Polyhedral Loop Transformations to Work.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

2002
Digital LC-2: from bits & gates to a little computer.
Proceedings of the 2002 workshop on Computer architecture education, 2002

Multi-periodic Process Networks: Prototyping and Verifying Stream-Processing Systems.
Proceedings of the Euro-Par 2002, 2002

2001
Induction Variable Analysis without Idiom Recognition: Beyond Monotonicity.
Proceedings of the Languages and Compilers for Parallel Computing, 2001

Monotonic evolution: an alternative to induction variable substitution for dependence analysis.
Proceedings of the 15th international conference on Supercomputing, 2001

2000
Maximal Static Expansion.
Int. J. Parallel Program., 2000

1999
Program Analysis and Transformation: From the Polytope Model to Formal Languages. (Analyse et transformation de programmes: du modèle polyédrique aux langages formels).
PhD thesis, 1999

Parallelization via Constrained Storage Mapping Optimization.
Proceedings of the High Performance Computing, Second International Symposium, 1999

Storage Mapping Optimization for Parallel Programs.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

1998
Instance-Wise Reaching Definition Analysis for Recursive Programs using Context-Free Transductions.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998


  Loading...