Paul H. J. Kelly

Orcid: 0000-0001-5905-1804

Affiliations:
  • Imperial College London


According to our database1, Paul H. J. Kelly authored at least 161 papers between 1987 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
A Robot Web for Distributed Many-Device Localization.
IEEE Trans. Robotics, 2024

Distributed Simultaneous Localisation and Auto-Calibration Using Gaussian Belief Propagation.
IEEE Robotics Autom. Lett., 2024

PCQ: Parallel Compact Quantum Circuit Simulation.
Proceedings of the 32nd IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2024

Gaussian Splatting SLAM.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

A shared compilation stack for distributed-memory parallelism in stencil DSLs.
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2024

2023
High-frame rate homography and visual odometry by tracking binary features from the focal plane.
Auton. Robots, December, 2023

Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison.
IEEE Trans. Parallel Distributed Syst., May, 2023

Automated MPI code generation for scalable finite-difference solvers.
CoRR, 2023

Precise event sampling-based data locality tools for AMD multicore architectures.
Concurr. Comput. Pract. Exp., 2023

2022
Identification and Classification of Off-Vertex Critical Points for Contour Tree Construction on Unstructured Meshes of Hexahedra.
IEEE Trans. Vis. Comput. Graph., 2022

Convolutional kernel function algebra.
Frontiers Comput. Sci., 2022

Tensor Computations: Applications and Optimization (Dagstuhl Seminar 22101).
Dagstuhl Reports, 2022

A Robot Web for Distributed Many-Device Localisation.
CoRR, 2022

Compiling CNNs with Cain: focal-plane processing for robot navigation.
Auton. Robots, 2022

Systematic comparison of path planning algorithms using PathBench.
Adv. Robotics, 2022

2021
Extending the RISC-V ISA for exploring advanced reconfigurable SIMD instructions.
CoRR, 2021

Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Simodense: a RISC-V softcore optimised for exploring custom SIMD instructions.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

Demonstrating custom SIMD instruction development for a RISC-V softcore.
Proceedings of the 31st International Conference on Field-Programmable Logic and Applications, 2021

PathBench: A Benchmarking Platform for Classical and Learned Path Planning Algorithms.
Proceedings of the 18th Conference on Robots and Vision, 2021

2020
Architecture and Performance of Devito, a System for Automated Stencil Computation.
ACM Trans. Math. Softw., 2020

A study of vectorization for matrix-free finite element methods.
Int. J. High Perform. Comput. Appl., 2020

Tensor Computations: Applications and Optimization (Dagstuhl Seminar 20111).
Dagstuhl Reports, 2020

Lossy Checkpoint Compression in Full Waveform Inversion.
CoRR, 2020

Abstracting spreadsheet data flow through hypergraph redrawing.
CoRR, 2020

AnalogNet: Convolutional Neural Network Inference on Analog Focal Plane Sensor Processors.
CoRR, 2020

Cain: Automatic Code Generation for Simultaneous Convolutional Kernels on Focal-plane Sensor-processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

BIT-VO: Visual Odometry at 300 FPS using Binary Features from the Focal Plane.
Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020

Scalable Uncertainty for Computer Vision With Functional Variational Inference.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modeling.
ACM Trans. Math. Softw., 2019

AUKE: Automatic Kernel Code Generation for an Analogue SIMD Focal-Plane Sensor-Processor Array.
ACM Trans. Archit. Code Optim., 2019

Pangloss: a novel Markov chain prefetcher.
CoRR, 2019

Characterizing Visual Localization and Mapping Datasets.
Proceedings of the International Conference on Robotics and Automation, 2019

SLAMBench 3.0: Systematic Automated Reproducible Evaluation of SLAM Systems for Robot Vision Challenges and Scene Understanding.
Proceedings of the International Conference on Robotics and Automation, 2019

Adaptive-Resolution Octree-Based Volumetric SLAM.
Proceedings of the 2019 International Conference on 3D Vision, 2019

2018
Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping.
IEEE Robotics Autom. Lett., 2018

Navigating the Landscape for Real-Time Localization and Mapping for Robotics and Virtual and Augmented Reality.
Proc. IEEE, 2018

Loop Optimization (Dagstuhl Seminar 18111).
Dagstuhl Reports, 2018

Navigating the Landscape for Real-time Localisation and Mapping for Robotics and Virtual and Augmented Reality.
CoRR, 2018

Investigating automatic vectorization for real-time 3D scene understanding.
Proceedings of the 4th Workshop on Programming Models for SIMD/Vector Processing, 2018

Towards In-Situ Vortex Identification for Peta-Scale CFD Using Contour Trees.
Proceedings of the 8th IEEE Symposium on Large Data Analysis and Visualization, 2018

Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2018

SLAMBench2: Multi-Objective Head-to-Head Benchmarking for Visual SLAM.
Proceedings of the 2018 IEEE International Conference on Robotics and Automation, 2018

2017
Trends in Data Locality Abstractions for HPC Systems.
IEEE Trans. Parallel Distributed Syst., 2017

Firedrake: Automating the Finite Element Method by Composing Abstractions.
ACM Trans. Math. Softw., 2017

An Algorithm for the Optimization of Finite Element Integration Loops.
ACM Trans. Math. Softw., 2017

Performance Portability in Extreme Scale Computing (Dagstuhl Seminar 17431).
Dagstuhl Reports, 2017

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling.
CoRR, 2017

Algebraic description and automatic generation of multigrid methods in SPIRAL.
Concurr. Comput. Pract. Exp., 2017

Algorithmic Performance-Accuracy Trade-off in 3D Vision Applications Using HyperMapper.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Application-oriented design space exploration for SLAM algorithms.
Proceedings of the 2017 IEEE International Conference on Robotics and Automation, 2017

2016
Acceleration of a Full-Scale Industrial CFD Application with OP2.
IEEE Trans. Parallel Distributed Syst., 2016

GiMMiK - Generating bespoke matrix multiplication kernels for accelerators: Application to high-order Computational Fluid Dynamics.
Comput. Phys. Commun., 2016

A numbering algorithm for finite element on extruded meshes which avoids the unstructured mesh penalty.
CoRR, 2016

Diplomat: Mapping of Multi-kernel Applications Using a Static Dataflow Abstraction.
Proceedings of the 24th IEEE International Symposium on Modeling, 2016

Comparative design space exploration of dense and semi-dense SLAM.
Proceedings of the 2016 IEEE International Conference on Robotics and Automation, 2016

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Optimised three-dimensional Fourier interpolation: An analysis of techniques and application to a linear-scaling density functional theory code.
Comput. Phys. Commun., 2015

An Interrupt-Driven Work-Sharing For-Loop Scheduler.
CoRR, 2015

Thread Parallelism for Highly Irregular Computation in Anisotropic Mesh Adaptation.
CoRR, 2015

Generating Optimized Fourier Interpolation Routines for Density Functional Theory Using SPIRAL.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM.
Proceedings of the IEEE International Conference on Robotics and Automation, 2015

A Fast and Scalable Graph Coloring Algorithm for Multi-core and Many-core Architectures.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

2014
Symbolic Crosschecking of Data-Parallel Floating-Point Code.
IEEE Trans. Software Eng., 2014

Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.
ACM Trans. Archit. Code Optim., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.
CoRR, 2014

Multidisciplinary Engineering Models: Methodology and Case Study in Spreadsheet Analytics.
CoRR, 2014

Dense planar SLAM.
Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, 2014

Generalizing Run-Time Tiling with the Loop Chain Abstraction.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

2013
Optimized code generation for finite element local assembly using symbolic manipulation.
ACM Trans. Math. Softw., 2013

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems.
Parallel Comput., 2013

Parallel partitioning for distributed systems using sequential assignment.
J. Parallel Distributed Comput., 2013

A thread-parallel algorithm for anisotropic mesh adaptation.
CoRR, 2013

Performance-Portable Finite Element Assembly Using PyOP2 and FEniCS.
Proceedings of the Supercomputing - 28th International Supercomputing Conference, 2013

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

Parametric GPU Code Generation for Affine Loop Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Loop Chaining: A Programming Abstraction for Balancing Locality and Parallelism.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

SLAM++: Simultaneous Localisation and Mapping at the Level of Objects.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

Computationally unifying urban masterplanning.
Proceedings of the Computing Frontiers Conference, 2013

Split tiling for GPUs: automatic parallelization using trapezoidal tiles.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

2012
Predictive modeling and analysis of OP2 on distributed memory GPU clusters.
SIGMETRICS Perform. Evaluation Rev., 2012

Introduction to the Special Issue on Automatic Program Generation for Embedded Systems.
Sci. Comput. Program., 2012

Hybrid OpenMP/MPI Anisotropic Mesh Smoothing.
Proceedings of the International Conference on Computational Science, 2012

Improving communication latency with the write-only architecture.
J. Parallel Distributed Comput., 2012

Guest Editorial: Computing Frontiers.
Int. J. Parallel Program., 2012

Performance Analysis and Optimization of the OP2 Framework on Many-Core Architectures.
Comput. J., 2012

PyOP2: A High-Level Framework for Performance-Portable Simulations on Unstructured Meshes.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

An Analytical Study of Loop Tiling for a Large-Scale Unstructured Mesh Application.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Compiler Optimizations for Industrial Unstructured Mesh CFD Applications on GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

Using domain-specific languages and access-execute descriptors to expand the parallel code synthesis design space: keynote talk.
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing, 2012

Mesh independent loop fusion for unstructured mesh applications.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Performance analysis of the OP2 framework on many-core architectures.
SIGMETRICS Perform. Evaluation Rev., 2011

DESOLA: An active linear algebra library using delayed evaluation and runtime code generation.
Sci. Comput. Program., 2011

Symbolic Testing of OpenCL Code.
Proceedings of the Hardware and Software: Verification and Testing, 2011

Symbolic crosschecking of floating-point and SIMD code.
Proceedings of the European Conference on Computer Systems, 2011

Accelerating Anisotropic Mesh Adaptivity on nVIDIA's CUDA Using Texture Interpolation.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

Design and Performance of the OP2 Library for Unstructured Mesh Applications.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

Loop-Directed Mothballing: Power-gating execution units using fast analysis of inner loops.
Proceedings of the 2011 IEEE Symposium on Low-Power and High-Speed Chips, 2011

2010
Towards generating optimised finite element solvers for GPUs from high-level specifications.
Proceedings of the International Conference on Computational Science, 2010

A batch algorithm for maintaining a topological order.
Proceedings of the Computer Science 2010, 2010

2009
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

Towards Metaprogramming for Parallel Systems on a Chip.
Proceedings of the Euro-Par 2009, 2009

High-performance SIMT code generation in an active visual effects library.
Proceedings of the 6th Conference on Computing Frontiers, 2009

2008
Inference of Session Types From Control Flow.
Proceedings of the 5th International Workshop on Formal Foundations of Embedded Software and Component-Based Software Architectures, 2008

2007
Efficient field-sensitive pointer analysis of C.
ACM Trans. Program. Lang. Syst., 2007

Profiling with AspectJ.
Softw. Pract. Exp., 2007

Explicit Dependence Metadata in an Active Visual Effects Library.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

A Declarative Framework for Analysis and Optimization.
Proceedings of the Compiler Construction, 16th International Conference, 2007

2006
A dynamic topological sort algorithm for directed acyclic graphs.
ACM J. Exp. Algorithmics, 2006

Performance prediction of paging workloads using lightweight tracing.
Future Gener. Comput. Syst., 2006

Is Morton layout competitive for large two-dimensional arrays yet?
Concurr. Comput. Pract. Exp., 2006

Automatically translating a general purpose C++ image processing library for GPUs.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Topic 4: Compilers for High Performance.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

2005
Generative and Adaptive Methods in Performance Programming.
Parallel Process. Lett., 2005

Minimizing Associativity Conflicts in Morton Layout.
Proceedings of the Parallel Processing and Applied Mathematics, 2005

A Domain-Specific Interpreter for Parallelizing a Large Mixed-Language Visualisation Application.
Proceedings of the Languages and Compilers for Parallel Computing, 2005

2004
Online Cycle Detection and Difference Propagation: Applications to Pointer Analysis.
Softw. Qual. J., 2004

A Dynamic Algorithm for Topologically Sorting Directed Acyclic Graphs.
Proceedings of the Experimental and Efficient Algorithms, Third International Workshop, 2004

Overcoming barriers to restructuring in a modular visualisation environment.
Proceedings of the 7th Workshop on languages, 2004

Topic 10: Parallel Programming: Models, Methods and Programming Languages.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

MEProf: modular extensible profiling for Eclipse.
Proceedings of the 2004 OOPSLA workshop on Eclipse Technology eXchange, 2004

2003
Search strategies for Java bottleneck location by dynamic instrumentation.
IEE Proc. Softw., 2003

Online Cycle Detection and Difference Propagation for Pointer Analysis.
Proceedings of the 3rd IEEE International Workshop on Source Code Analysis and Manipulation (SCAM 2003), 2003

Optimising Java RMI Programs by Communication Restructuring.
Proceedings of the Middleware 2003, 2003

Improving the Performance of Morton Layout by Array Alignment and Loop Unrolling: Reducing the Price of Naivety.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Runtime Code Generation in C++ as a Foundation for Domain-Specific Optimisation.
Proceedings of the Domain-Specific Program Generation, International Seminar, 2003

2002
Is Morton Layout Competitive for Large Two-Dimensional Arrays?
Proceedings of the Euro-Par 2002, 2002

Delayed Evaluation, Self-optimising Software Components as a Programming Model.
Proceedings of the Euro-Par 2002, 2002

Instant-Access Cycle-Stealing for Parallel Applications Requiring Interactive Response.
Proceedings of the Euro-Par 2002, 2002

Optimising Shared Reduction Variables in MPI Programs.
Proceedings of the Euro-Par 2002, 2002

GILK: A Dynamic Instrumentation Tool for the Linux Kernel.
Proceedings of the Computer Performance Evaluation, 2002

2001
THEMIS: Component Dependence Metadata in Adaptive Parallel Applications.
Parallel Process. Lett., 2001

Pipelined functional tree accesses and updates: scheduling, synchronization, caching and coherence.
J. Funct. Program., 2001

Topic 10: Parallel Programming: Models, Methods and Programming Languages.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Run-Time Fusion of MPI Calls in a Parallel C++ Library.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Adaptive Proxies: Handling Widely-Shared Data in Shared-Memory Multiprocessors (Research Note).
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

Programming Languages, Models, and Methods.
Proceedings of the Euro-Par 2000, Parallel Processing, 6th International Euro-Par Conference, Munich, Germany, August 29, 2000

1999
A Linear Algebra Formulation for Optimising Replication in Data Parallel Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

1998
Efficient Interprocedural Data Placement Optimisation in a Parallel Library.
Proceedings of the Languages, 1998

Reactive Proxies: A Flexible Protocol Extension to Reduce ccNUMA Node Controller Contention.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

Data Distribution at Run-Time: Re-using Execution Plans.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

1997
Efficient shared-memory support for parallel graph reduction.
Future Gener. Comput. Syst., 1997

M-Tree: A Parallel Abstract Data Type for Block-Irregular Adaptive Applictions.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

Runtime Interprocedural Data Placement Optimisation for Lazy Parallel Libraries (Extended Abstract).
Proceedings of the Euro-Par '97 Parallel Processing, 1997

Backwards-Compatible Bounds Checking for Arrays and Pointers in C Programs.
Proceedings of the Third International Workshop on Automated Debugging, 1997

1996
Cautions, Machine-Independent Performance Tuning for Shared-Memory Multiprocessors.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Using Proxies to Reduce Controller Contention in Large Shared-Memory Multiprocessors.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

1995
A Lazy, Self-optimizing Parallel Matrix Library.
Proceedings of the Functional Programming, Glasgow, UK, 1995, 1995

1994
Implementing functional languages : S Peyton Jones and D Lester Prentice-Hall UK (1992) 281 pp £22.95 ISBN 0 13 721952 0.
Inf. Softw. Technol., 1994

Paragon specifications: Structure, analysis and implementation.
Future Gener. Comput. Syst., 1994

Derivation and performance of a pipelined transaction processor.
Proceedings of the Sixth IEEE Symposium on Parallel and Distributed Processing, 1994

Eliminating Invalidation in Coherent-Cache Parallel Graph Reduction.
Proceedings of the PARLE '94: Parallel Architectures and Languages Europe, 1994

Angel: Resource Unification in a.64-bit Microkernel.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

1993
Parallel Programming Using Skeleton Functions.
Proceedings of the PARLE '93, 1993

Localtiy and False Sharing in Coherent-Cache Parallel Graph Reduction.
Proceedings of the PARLE '93, 1993

Design and Implementation of an Object-Oriented 64-bit Single Address Space Microkernel.
Proceedings of the USENIX Microkernels and Other Kernel Architectures Symposium, 1993

1990
Parallel object-oriented descriptions of graph reduction machines.
Future Gener. Comput. Syst., 1990

The feasibility of a general-purpose parallel computer using WSI.
Future Gener. Comput. Syst., 1990

1989
Parallel Object-Oriented Descriptions of Graph Reduction Machines (extended abstract).
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989

The Feasibility of a General-purpose Parallel Computing using WSI.
Proceedings of the PARLE '89: Parallel Architectures and Languages Europe, 1989

1987
COBWEB-2: Structured Specification of a Wafer-Scale Supercomputer.
Proceedings of the PARLE, 1987


  Loading...