Shoaib Kamil

Orcid: 0000-0001-5965-3717

Affiliations:
  • Adobe Systems Inc., Seattle, WA, USA


According to our database1, Shoaib Kamil authored at least 66 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
IMESH: A DSL for Mesh Processing.
ACM Trans. Graph., October, 2024

2023
Fast Instruction Selection for Fast Digital Signal Processing.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
Searching for Fast Demosaicking Algorithms.
ACM Trans. Graph., 2022

Sparsity-Specific Code Optimization using Expression Trees.
ACM Trans. Graph., 2022

H rtDown: Document Processor for Executable Linear Algebra Papers.
Proceedings of the SIGGRAPH Asia 2022 Conference Papers, 2022

A Cross-Platform Benchmark for Interval Computation Libraries.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

Vector instruction selection for digital signal processors using program synthesis.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
I♥LA: compilable markdown for linear algebra.
ACM Trans. Graph., 2021

I$\heartsuit$LA: Compilable Markdown for Linear Algebra.
CoRR, 2021

Domain-Specific Language Abstractions for Compression.
Proceedings of the 31st Data Compression Conference, 2021

Compiling Graph Applications for GPU s with GraphIt.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2021

2020
NASOQ: numerically accurate sparsity-oriented QP solver.
ACM Trans. Graph., 2020

A sparse iteration space transformation framework for sparse tensor algebra.
Proc. ACM Program. Lang., 2020

Verifying and improving Halide's term rewriting system with program synthesis.
Proc. ACM Program. Lang., 2020

Compliation Techniques for Graphs Algorithms on GPUs.
CoRR, 2020

A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra.
CoRR, 2020

EGGS: Sparsity-Specific Code Generation.
Comput. Graph. Forum, 2020

Optimizing ordered graph algorithms with GraphIt.
Proceedings of the CGO '20: 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020

2019
Automatically translating image processing libraries to halide.
ACM Trans. Graph., 2019

Modular verification of web page layout.
Proc. ACM Program. Lang., 2019

PriorityGraph: A Unified Programming Model for Optimizing Ordered Graph Algorithms.
CoRR, 2019

Tensor Algebra Compilation with Workspaces.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019

2018
GraphIt: a high-performance graph DSL.
Proc. ACM Program. Lang., 2018

GraphIt - A High-Performance DSL for Graph Analytics.
CoRR, 2018

Automatic Generation of Sparse Tensor Kernels with Workspaces.
CoRR, 2018

ParSy: inspection and transformation of sparse matrix computations for parallelism.
Proceedings of the International Conference for High Performance Computing, 2018

Verifying that web pages have accessible layout.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

2017
The tensor algebra compiler.
Proc. ACM Program. Lang., 2017

Sympiler: transforming sparse matrix codes by decoupling symbolic analysis.
Proceedings of the International Conference for High Performance Computing, 2017

taco: a tool to generate tensor algebra kernels.
Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, 2017

Parallel associative reductions in halide.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

2016
Simit: A Language for Physical Simulation.
ACM Trans. Graph., 2016

Distributed Halide.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Verified lifting of stencil computations.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

2015
Parallel processing of filtered queries in attributed semantic graphs.
J. Parallel Distributed Comput., 2015

Bridging the Gap Between General-Purpose and Domain-Specific Compilers with Synthesis.
Proceedings of the 1st Summit on Advances in Programming Languages, 2015

Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2014
MSL: A Synthesis Enabled Language for Distributed Implementations.
Proceedings of the International Conference for High Performance Computing, 2014

WOSC 2014: second workshop on optimizing stencil computations.
Proceedings of the SPLASH'14, 2014

OpenTuner: an extensible framework for program autotuning.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

High-Productivity and High-Performance Analysis of Filtered Semantic Graphs.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

2012
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages.
PhD thesis, 2012

Auto-tuning the Matrix Powers Kernel with SEJITS.
Proceedings of the High Performance Computing for Computational Science, 2012

Parallel High Performance Bootstrapping in Python.
Proceedings of the 11th Python in Science Conference 2012 (SciPy 2012), 2012

Poster: Beating MKL and ScaLAPACK at Rectangular Matrix Multiplication Using the BFS/DFS Approach.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Portable parallel performance from sequential, productive, embedded domain-specific languages.
Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2012

High-performance analysis of filtered semantic graphs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2012

2011
Bringing Parallel Performance to Python with Domain-Specific Selective Embedded Just-in-Time Specialization.
Proceedings of the 10th Python in Science Conference 2011 (SciPy 2011), Austin, Texas, July 11, 2011

CUDA-level Performance with Python-level Productivity for Gaussian Mixture Model Applications.
Proceedings of the 3rd USENIX Workshop on Hot Topics in Parallelism, 2011

2010
Communication Requirements and Interconnect Optimization for High-End Scientific Applications.
IEEE Trans. Parallel Distributed Syst., 2010

An auto-tuning framework for parallel multicore stencil computations.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Silicon Nanophotonic Network-on-Chip Using TDM Arbitration.
Proceedings of the IEEE 18th Annual Symposium on High Performance Interconnects, 2010

2009
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors.
SIAM Rev., 2009

Energy-Efficient Computing for Extreme-Scale Science.
Computer, 2009

Analysis of photonic networks for a chip multiprocessor using scientific applications.
Proceedings of the Third International Symposium on Networks-on-Chips, 2009

2008
Power efficiency in high performance computing.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
Scientific Computing Kernels on the Cell Processor.
Int. J. Parallel Program., 2007

Scientific Application Performance on Candidate PetaScale Platforms.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

Reconfigurable hybrid interconnection for static and dynamic scientific applications.
Proceedings of the 4th Conference on Computing Frontiers, 2007

2006
The potential of the cell processor for scientific computing.
Proceedings of the Third Conference on Computing Frontiers, 2006

Implicit and explicit optimizations for stencil computations.
Proceedings of the 2006 workshop on Memory System Performance and Correctness, 2006

2005
Analyzing Ultra-Scale Application Communication Requirements for a Reconfigurable Hybrid Interconnect.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005

Impact of modern memory subsystems on cache optimizations for stencil computations.
Proceedings of the 2005 workshop on Memory System Performance, 2005

2002
Performance optimizations and bounds for sparse matrix-vector multiply.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002


  Loading...