Milind Kulkarni

Orcid: 0000-0001-6827-345X

Affiliations:
  • Purdue University, West Lafayette, USA


According to our database1, Milind Kulkarni authored at least 103 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Orchard: Heterogeneous Parallelism and Fine-grained Fusion for Complex Tree Traversals.
ACM Trans. Archit. Code Optim., June, 2024

METS-R SIM: A simulator for Multi-modal Energy-optimal Trip Scheduling in Real-time with shared autonomous electric vehicles.
Simul. Model. Pract. Theory, 2024

Optimizing Layout of Recursive Datatypes with Marmoset (Artifact).
Dagstuhl Artifacts Ser., 2024

SABLE: Staging Blocked Evaluation of Sparse Matrix Computations.
CoRR, 2024

Optimizing Layout of Recursive Datatypes with Marmoset.
CoRR, 2024

Mochi: Fast \& Exact Collision Detection.
CoRR, 2024

Garbage Collection for Mostly Serialized Heaps.
Proceedings of the 2024 ACM SIGPLAN International Symposium on Memory Management, 2024

Arkade: k-Nearest Neighbor Search With Non-Euclidean Distances using GPU Ray Tracing.
Proceedings of the 38th ACM International Conference on Supercomputing, 2024

Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms {+} Data Layouts {=} Efficient Programs.
Proceedings of the 38th European Conference on Object-Oriented Programming, 2024

2023
SparseAuto: An Auto-Scheduler for Sparse Tensor Computations Using Recursive Loop Nest Restructuring.
CoRR, 2023

Generalized Neighbor Search using Commodity Hardware Acceleration.
CoRR, 2023

Targeted Control-flow Transformations for Mitigating Path Explosion in Dynamic Symbolic Execution.
CoRR, 2023

Synthesis of Distributed Agreement-Based Systems with Efficiently-Decidable Verification.
Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems, 2023

RT-DBSCAN: Accelerating DBSCAN using Ray Tracing Hardware.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

RT-kNNS Unbound: Using RT Cores to Accelerate Unrestricted Neighbor Search.
Proceedings of the 37th International Conference on Supercomputing, 2023

HyBF: A Hybrid Branch Fusion Strategy for Code Size Reduction.
Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction, 2023

Coyote: A Compiler for Vectorizing Encrypted Arithmetic Circuits.
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023

2022
UniRec: a unimodular-like framework for nested recursions and loops.
Proc. ACM Program. Lang., 2022

Challenges in Firmware Re-Hosting, Emulation, and Analysis.
ACM Comput. Surv., 2022

Synthesis of Distributed Agreement-Based Systems with Efficiently-Decidable Parameterized Verification.
CoRR, 2022

Cornucopia : A Framework for Feedback Guided Generation of Binaries.
Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, 2022

SparseLNR: accelerating sparse tensor computations using loop nest restructuring.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

DARM: Control-Flow Melding for SIMT Thread Divergence Reduction.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021
Efficient tree-traversals: reconciling parallelism and dense data representations.
Proc. ACM Program. Lang., 2021

QuickSilver: modeling and parameterized verification for distributed agreement-based systems.
Proc. ACM Program. Lang., 2021

CFM: SIMT Thread Divergence Reduction by Melding Similar Control-Flow Regions in GPGPU Programs.
CoRR, 2021

Vectorized secure evaluation of decision forests.
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021


2020
Vision Paper: Grand Challenges in Resilience: Autonomous System Resilience through Design and Runtime Measures.
IEEE Open J. Comput. Soc., 2020

HACCLE: An Ecosystem for Building Secure Multi-Party Computations.
CoRR, 2020

Parameterized Reasoning for Distributed Systems with Consensus.
CoRR, 2020

Parameterized Verification of Systems with Global Synchronization and Guards.
Proceedings of the Computer Aided Verification - 32nd International Conference, 2020

2019
Processor-Oblivious Record and Replay.
ACM Trans. Parallel Comput., 2019

Extracting SIMD Parallelism from Recursive Task-Parallel Programs.
ACM Trans. Parallel Comput., 2019

A-RESCUE 2.0: A High-Fidelity, Parallel, Agent-Based Evacuation Simulator.
J. Comput. Civ. Eng., 2019

Grand Challenges of Resilience: Autonomous System Resilience through Design and Runtime Measures.
CoRR, 2019

Sound, Fine-Grained Traversal Fusion for Heterogeneous Trees - Extended Version.
CoRR, 2019

D2P: from recursive formulations to distributed-memory codes.
Proceedings of the International Conference for High Performance Computing, 2019

LoCal: a language for programs operating on serialized data.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Composable, sound transformations of nested recursion and loops.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

Sound, fine-grained traversal fusion for heterogeneous trees.
Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2019

XSTRESSOR : Automatic Generation of Large-Scale Worst-Case Test Inputs by Inferring Path Conditions.
Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification, 2019

PySE: Automatic Worst-Case Test Generation by Reinforcement Learning.
Proceedings of the 12th IEEE Conference on Software Testing, Validation and Verification, 2019

Efficient GPU tree walks for effective distributed n-body simulations.
Proceedings of the ACM International Conference on Supercomputing, 2019

MULKSG: MULtiple K Simultaneous Graph Assembly.
Proceedings of the Algorithms for Computational Biology - 6th International Conference, 2019

2017
TreeFuser: a framework for analyzing and fusing general recursive tree traversals.
Proc. ACM Program. Lang., 2017

Exploiting Vector and Multicore Parallelism for Recursive, Data- and Task-Parallel Programs.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017

Treelogy: A benchmark suite for tree traversals.
Proceedings of the 2017 IEEE International Symposium on Performance Analysis of Systems and Software, 2017

SPIRIT: a framework for creating distributed recursive tree applications.
Proceedings of the International Conference on Supercomputing, 2017

Efficient Collaborative Approximation in MapReduce without Missing Rare Keys.
Proceedings of the 2017 International Conference on Cloud and Autonomic Computing, 2017

Compiling Tree Transforms to Operate on Packed Representations.
Proceedings of the 31st European Conference on Object-Oriented Programming, 2017

Legato: end-to-end bounded region serializability using commodity hardware transactional memory.
Proceedings of the 2017 International Symposium on Code Generation and Optimization, 2017

Data structure-aware heap partitioning.
Proceedings of the 26th International Conference on Compiler Construction, 2017

Scalable Genomic Assembly through Parallel <i>de Bruijn</i> Graph Construction for Multiple K-mers.
Proceedings of the 8th ACM International Conference on Bioinformatics, 2017

Locality Transformations for Nested Recursive Iteration Spaces.
Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, 2017

2016
SPIRIT: a runtime system for distributed irregular tree applications.
Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2016

Evaluating Performance of Task and Data Coarsening in Concurrent Collections.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Locality-Aware Task-Parallel Execution on GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Treelogy: a benchmark suite for tree traversal applications.
Proceedings of the 2016 IEEE International Symposium on Workload Characterization, 2016

SARVAVID: A Domain Specific Language for Developing Scalable Computational Genomics Applications.
Proceedings of the 2016 International Conference on Supercomputing, 2016

Hybrid CPU-GPU scheduling and execution of tree traversals.
Proceedings of the 2016 International Conference on Supercomputing, 2016

2015
Debugging high-performance computing applications at massive scales.
Commun. ACM, 2015

Optimizing the LULESH stencil code using concurrent collections.
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization.
Proceedings of the Principles and Practices of Programming on The Java Platform, 2015

Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine.
Proceedings of the Principles and Practices of Programming on The Java Platform, 2015

Tree dependence analysis.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

Efficient execution of recursive programs on commodity vector hardware.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

SemCache++: Semantics-Aware Caching for Efficient Multi-GPU Offloading.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Beyond Big Data - Rethinking Programming Languages for Non-persistent Data.
Proceedings of the International Conference on Cloud Computing and Big Data, 2015

Hybrid Static: Dynamic Analysis for Statically Bounded Region Serializability.
Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

2014
Orion: Scaling Genomic Sequence Matching with Fine-Grained Parallelization.
Proceedings of the International Conference for High Performance Computing, 2014

2013
General transformations for GPU execution of tree traversals.
Proceedings of the International Conference for High Performance Computing, 2013

WuKong: effective diagnosis of bugs at large system scales.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2013

OCTET: capturing and controlling cross-thread dependences efficiently.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

Exploiting domain knowledge to optimize parallel computational mechanics codes.
Proceedings of the International Conference on Supercomputing, 2013

SemCache: semantics-aware caching for efficient GPU offloading.
Proceedings of the International Conference on Supercomputing, 2013

WuKong: automatically detecting and localizing bugs that manifest at large system scales.
Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, 2013

EventWave: programming model and runtime support for tightly-coupled elastic cloud applications.
Proceedings of the ACM Symposium on Cloud Computing, SOCC '13, 2013

Automatic vectorization of tree traversals.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Automatically enhancing locality for tree traversals with traversal splicing.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

ABHRANTA: Locating Bugs that Manifest at Large System Scales.
Proceedings of the Eighth Workshop on Hot Topics in System Dependability, HotDep 2012, 2012

Programming Model Support for Dependable, Elastic Cloud Applications.
Proceedings of the Eighth Workshop on Hot Topics in System Dependability, HotDep 2012, 2012

2011
Brief announcement: locality-enhancing loop transformations for tree traversal algorithms.
Proceedings of the SPAA 2011: Proceedings of the 23rd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2011

The tao of parallelism in algorithms.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Exploiting the commutativity lattice.
Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011

Enhancing locality for recursive traversals of recursive structures.
Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

μSETL: A set based programming abstraction for wireless sensor networks.
Proceedings of the 10th International Conference on Information Processing in Sensor Networks, 2011

Vrisha: using scaling properties of parallel programs for bug detection and localization.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

InContext: simple parallelism for distributed applications.
Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, 2011

Techniques for Fine-Grained, Multi-site Computation Offloading.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Towards architecture independent metrics for multicore performance analysis.
SIGMETRICS Perform. Evaluation Rev., 2010

Brief announcement: locality-aware load balancing for speculatively-parallelized irregular applications.
Proceedings of the SPAA 2010: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2010

Structure-driven optimizations for amorphous data-parallel programs.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Accelerating multicore reuse distance analysis with sampling and parallelization.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Optimistic parallelism requires abstractions.
Commun. ACM, 2009

How much parallelism is there in irregular applications?
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Lonestar: A suite of parallel irregular programs.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2009

2008
The Galois System: Optimistic Parallelization of Irregular Programs.
PhD thesis, 2008

An Experimental Study of Self-Optimizing Dense Linear Algebra Software.
Proc. IEEE, 2008

Scheduling strategies for optimistic parallel execution of irregular programs.
Proceedings of the SPAA 2008: Proceedings of the 20th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2008

On the Scalability of an Automatically Parallelized Irregular Application.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Optimistic parallelism benefits from data partitioning.
Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008

2007
Scheduling Issues in Optimistic Parallelization.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007


  Loading...