Vinod Grover

Orcid: 0000-0003-0115-3896

According to our database1, Vinod Grover authored at least 32 papers between 2008 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving.
CoRR, January, 2025

Pattern Matching in AI Compilers and Its Formalization.
Proceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization, 2025

Scaling Deep Learning Training with MPMD Pipeline Parallelism.
CoRR, 2024

Pattern Matching in AI Compilers and its Formalization (Extended Version).
CoRR, 2024

Graphene: An IR for Optimized Tensor Computations on GPUs.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

Axon: A Language for Dynamic Shapes in Deep Learning Graphs.
CoRR, 2022

Probabilistic Programming with CuPPL.
CoRR, 2020

Automatic Kernel Generation for Volta Tensor Cores.
CoRR, 2020

Fireiron: A Scheduling Language for High-Performance Linear Algebra on GPUs.
CoRR, 2020

Fireiron: A Data-Movement-Aware Scheduling Language for GPUs.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

Automatic acceleration of Numpy applications on GPUs and multicore CPUs.
CoRR, 2019

Swizzle Inventor: Data Movement Synthesis for GPU Kernels.
Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, 2019

Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.
Proc. IEEE, 2018

CURD: a dynamic CUDA race detector.
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018

Diesel: DSL for linear algebra and neural net computations on GPUs.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, 2018

Effective resource management for enhancing performance of 2D and 3D stencils on GPUs.
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016

Resource Conscious Reuse-Driven Tiling for GPUs.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

Forma: a DSL for image processing applications to target GPUs and multi-core CPUs.
Proceedings of the 8th Workshop on General Purpose Processing using GPUs, 2015

Fusing convolution kernels through tiling.
Proceedings of the 2nd ACM SIGPLAN International Workshop on Libraries, 2015

Type-safe runtime code generation: accelerate to LLVM.
Proceedings of the 8th ACM SIGPLAN Symposium on Haskell, 2015

NOVA: A Functional Language for Data Parallelism.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures.
Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014

LambdaJIT: a dynamic compiler for heterogeneous optimizations of STL algorithms.
Proceedings of the 3rd ACM SIGPLAN workshop on Functional high-performance computing, 2014

Separate Compilation in a Language-Integrated Heterogeneous Environment.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Towards shared memory consistency models for GPUs.
Proceedings of the International Conference on Supercomputing, 2013

Convergence and scalarization for data-parallel architectures.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013

CUDA: Compiling and optimizing for a GPU platform.
Proceedings of the International Conference on Computational Science, 2012

JaBEE: framework for object-oriented Java bytecode compilation and execution on graphics processor units.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012

Scalable Manycore Computing with CUDA.
Fundamentals of Multicore Software Development, 2012

Accelerating Haskell array codes with multicore GPUs.
Proceedings of the POPL 2011 Workshop on Declarative Aspects of Multicore Programming, 2011

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs.
Proceedings of the CGO 2010, 2010

Samurai: protecting critical data in unsafe languages.
Proceedings of the 2008 EuroSys Conference, Glasgow, Scotland, UK, April 1-4, 2008, 2008
