Vivek Sarkar

Orcid: 0000-0002-3433-8830

Affiliations:
  • Georgia Institute of Technology, GA, USA
  • Rice University, Department of Computer Science (former)
  • IBM Research (former)


According to our database1, Vivek Sarkar authored at least 293 papers between 1986 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2008, "For contributions to technologies for parallel computing.".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Fully Verified Instruction Scheduling.
Proc. ACM Program. Lang., 2024

Asynchronous Distributed Actor-Based Approach to Jaccard Similarity for Genome Comparisons.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Asynchronous Distributed-Memory Parallel Algorithms for Influence Maximization.
Proceedings of the International Conference for High Performance Computing, 2024

Visualizing Correctness Issues in OpenMP Programs.
Proceedings of the Advancing OpenMP for Future Accelerators, 2024

Bottleneck Scenarios in Use of the Conveyors Message Aggregation Library.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2024

A Distributed, Asynchronous Algorithm for Large-Scale Internet Network Topology Analysis.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

APPy: Annotated Parallelism for Python on GPUs.
Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction, 2024

2023
Concrete Type Inference for Code Optimization using Machine Learning with SMT Solving.
Proc. ACM Program. Lang., October, 2023

A Fine-grained Asynchronous Bulk Synchronous parallelism model for PGAS applications.
J. Comput. Sci., May, 2023

HIPLZ: Enabling performance portability for exascale systems.
Concurr. Comput. Pract. Exp., 2023

Towards Safe HPC: Productivity and Performance via Rust Interfaces for a Distributed C++ Actors Library (Work in Progress).
Proceedings of the 20th ACM SIGPLAN International Conference on Managed Programming Languages and Runtimes, 2023

Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Dynamic Determinacy Race Detection for Task-Parallel Programs with Promises.
Proceedings of the 37th European Conference on Object-Oriented Programming, 2023

Highly Scalable Large-Scale Asynchronous Graph Processing using Actors.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators.
ACM Trans. Archit. Code Optim., 2022

OpenMP application experiences: Porting to accelerated nodes.
Parallel Comput., 2022

A Multi-Level Platform-Independent GPU API for High-Level Programming Models.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

A Productive and Scalable Actor-Based Programming System for PGAS Applications.
Proceedings of the Computational Science - ICCS 2022, 2022

Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing.
Proceedings of the Euro-Par 2022: Parallel Processing, 2022

Leveraging the Dynamic Program Structure Tree to Detect Data Races in OpenMP Programs.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

MiniKokkos: A Calculus of Portable Parallelism.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

Memory access scheduling to reduce thread migrations.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022

ReACT: Redundancy-Aware Code Generation for Tensor Expressions.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022

Introduction.
Proceedings of the SSA-based Compiler Design, 2022

Array SSA Form.
Proceedings of the SSA-based Compiler Design, 2022

2021
Linear Promises: Towards Safer Concurrent Programming (Artifact).
Dagstuhl Artifacts Ser., 2021

A Scalable Actor-based Programming System for PGAS Runtimes.
CoRR, 2021

An ownership policy and deadlock detector for promises.
Proceedings of the PPoPP '21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2021

SHMEM-ML: Leveraging OpenSHMEM and Apache Arrow for Scalable, Composable Machine Learning.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Exascale and Smart Networks, 2021

ARBALEST: Dynamic Detection of Data Mapping Issues in Heterogeneous OpenMP Applications.
Proceedings of the 35th IEEE International Parallel and Distributed Processing Symposium, 2021

Task-graph scheduling extensions for efficient synchronization and communication.
Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Linear Promises: Towards Safer Concurrent Programming.
Proceedings of the 35th European Conference on Object-Oriented Programming, 2021

2020
MAESTRO: A Data-Centric Approach to Understand Reuse, Performance, and Hardware Cost of DNN Mappings.
IEEE Micro, 2020

Advanced Graph-Based Deep Learning for Probabilistic Type Inference.
CoRR, 2020

MISIM: An End-to-End Neural Code Similarity System.
CoRR, 2020

Context-Aware Parse Trees.
CoRR, 2020

MARVEL: A Decoupled Model-driven Approach for Efficiently Mapping Convolutions on Spatial DNN Accelerators.
CoRR, 2020

Integrating Inter-Node Communication with a Resilient Asynchronous Many-Task Runtime System.
Proceedings of the Workshop on Exascale MPI, 2020

HOOVER: Leveraging OpenSHMEM for High Performance, Flexible Streaming Graph Applications.
Proceedings of the 3rd IEEE/ACM Annual Parallel Applications Workshop: Alternatives To MPI+X, 2020

Intrepydd: performance, productivity, and portability for data science application kernels.
Proceedings of the 2020 ACM SIGPLAN International Symposium on New Ideas, 2020

An Affine Scheduling Framework for Integrating Data Layout and Loop Transformations.
Proceedings of the Languages and Compilers for Parallel Computing, 2020

A Study of Memory Anomalies in OpenMP Applications.
Proceedings of the OpenMP: Portable Multi-Level Parallelism on Modern Systems, 2020

Exploring a multi-resolution GPU programming model for Chapel.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Vyasa: A High-Performance Vectorizing Compiler for Tensor Convolutions on the Xilinx AI Engine.
Proceedings of the 2020 IEEE High Performance Extreme Computing Conference, 2020

OmpMemOpt: Optimized Memory Movement for Heterogeneous Computing.
Proceedings of the Euro-Par 2020: Parallel Processing, 2020

2019
Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations.
Int. J. High Perform. Comput. Netw., 2019

Transitive joins: a sound and efficient online deadlock-avoidance policy.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach.
Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019

Common Subexpression Convergence: A New Code Optimization for SIMT Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2019

OMPSan: Static Verification of OpenMP's Data Mapping Constructs.
Proceedings of the OpenMP: Conquering the Full Hardware Spectrum, 2019

High Performance Multilevel Graph Partitioning on GPU.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

Experimental Insights from the Rogues Gallery.
Proceedings of the 2019 IEEE International Conference on Rebooting Computing, 2019

Optimized Execution of Parallel Loops via User-Defined Scheduling Policies.
Proceedings of the 48th International Conference on Parallel Processing, 2019

Data Flow Execution Models - A Third Opinion.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

T2S-Tensor: Productively Generating High-Performance Spatial Hardware for Dense Tensor Computations.
Proceedings of the 27th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2019

Enabling Resilience in Asynchronous Many-Task Programming Models.
Proceedings of the Euro-Par 2019: Parallel Processing, 2019

Valence: variable length calling context encoding.
Proceedings of the 28th International Conference on Compiler Construction, 2019

2018
Race Detection in Two Dimensions.
ACM Trans. Parallel Comput., 2018

Compile-Time Library Call Detection Using CAASCADE and XALT.
Proceedings of the High Performance Computing, 2018

Porting DMRG++ Scientific Application to OpenPOWER.
Proceedings of the High Performance Computing, 2018

Detecting MPI usage anomalies via partial program symbolic execution.
Proceedings of the International Conference for High Performance Computing, 2018

Using Polyhedral Analysis to Verify OpenMP Applications are Data Race Free.
Proceedings of the 2nd IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2018

A One Year Retrospective on a MOOC in Parallel, Concurrent, and Distributed Programming in Java.
Proceedings of the 2018 IEEE/ACM Workshop on Education for High-Performance Computing, 2018

A Unified Runtime for PGAS and Event-Driven Programming.
Proceedings of the 4th International Workshop on Extreme Scale Programming Models and Middleware, 2018

A Preliminary Study of Compiler Transformations for Graph Applications on the Emu System.
Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

SHCOLL - A Standalone Implementation of OpenSHMEM-Style Collectives API.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

HOOVER: Distributed, Flexible, and Scalable Streaming Graph Processing on OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Topkapi: Parallel and Fast Sketches for Finding Top-K Frequent Elements.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

In-Register Parameter Caching for Dynamic Neural Nets with Virtual Persistent Processor Specialization.
Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture, 2018

A Unified Approach to Variable Renaming for Enhanced Vectorization.
Proceedings of the Languages and Compilers for Parallel Computing, 2018

RegMutex: Inter-Warp GPU Register Time-Sharing.
Proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture, 2018

GT-Race: Graph Traversal Based Data Race Detection for Asynchronous Many-Task Parallelism.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Using Dynamic Compilation to Achieve Ninja Performance for CNN Training on Many-Core Processors.
Proceedings of the Euro-Par 2018: Parallel Processing, 2018

S2FA: an accelerator automation framework for heterogeneous computing in datacenters.
Proceedings of the 55th Annual Design Automation Conference, 2018

MiniApp for Density Matrix Renormalization Group Hamiltonian Application Kernel.
Proceedings of the IEEE International Conference on Cluster Computing, 2018

Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling.
Proceedings of the 27th International Conference on Compiler Construction, 2018

Parallel sparse flow-sensitive points-to analysis.
Proceedings of the 27th International Conference on Compiler Construction, 2018

Cost-driven thread coarsening for GPU kernels.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, 2018

2017
Deadlock avoidance in parallel programs with futures: why parallel tasks should not wait for strangers.
Proc. ACM Program. Lang., 2017

Pedagogy and tools for teaching parallel computing at the sophomore undergraduate level.
J. Parallel Distributed Comput., 2017

Formalization of Habanero phasers using Coq.
J. Log. Algebraic Methods Program., 2017

Exploration of Supervised Machine Learning Techniques for Runtime Selection of CPU vs. GPU Execution in Java Programs.
Proceedings of the Accelerator Programming Using Directives - 4th International Workshop, 2017

Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Graph500 on OpenSHMEM: Using A Practical Survey of Past Work to Motivate Novel Algorithmic Developments.
Proceedings of PAW@SC 2017: Second Annual PGAS Applications Workshop, 2017

DAMMP: A Distributed Actor Model for Mobile Platforms.
Proceedings of the 14th International Conference on Managed Languages and Runtimes, 2017

Implementation and Evaluation of OpenSHMEM Contexts Using OFI Libfabric.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

A marshalled data format for pointers in relocatable data blocks.
Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 2017

Preparing an Online Java Parallel Computing Course.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

A Pluggable Framework for Composable HPC Scheduling Libraries.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Optimized two-level parallelization for GPU accelerators using the polyhedral model.
Proceedings of the 26th International Conference on Compiler Construction, 2017

2016
HadoopCL2: Motivating the Design of a Distributed, Heterogeneous Programming System With Machine-Learning Applications.
IEEE Trans. Parallel Distributed Syst., 2016

SCnC: Efficient Unification of Streaming with Dynamic Task Parallelism.
Int. J. Parallel Program., 2016

A survey of sparse matrix-vector multiplication performance on large matrices.
CoRR, 2016

Formalization of Phase Ordering.
Proceedings of the Ninth workshop on Programming Language Approaches to Concurrency- and Communication-cEntric Software, 2016

Brief Announcement: Dynamic Determinacy Race Detection for Task Parallelism with Futures.
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016

Static Cost Estimation for Data Layout Selection on GPUs.
Proceedings of the 7th International Workshop on Performance Modeling, 2016

PIPES: a language and compiler for task-based programming on distributed-memory clusters.
Proceedings of the International Conference for High Performance Computing, 2016

Exploring Compiler Optimization Opportunities for the OpenMP 4.× Accelerator Model on a POWER8+GPU Platform.
Proceedings of the Third Workshop on Accelerator Programming Using Directives, 2016

Fine-Grained Parallelism in Probabilistic Parsing with Habanero Java.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Optimized Distributed Work-Stealing.
Proceedings of the 6th Workshop on Irregular Applications: Architecture and Algorithms, 2016

Dynamic Determinacy Race Detection for Task Parallelism with Futures.
Proceedings of the Runtime Verification - 16th International Conference, 2016

A Distributed Selectors Runtime System for Java Applications.
Proceedings of the 13th International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools, Lugano, Switzerland, August 29, 2016

Integrating Asynchronous Task Parallelism with OpenSHMEM.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

Automatic parallelization of pure method calls via conditional future synthesis.
Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, 2016

An Extended Polyhedral Model for SPMD Programs and Its Use in Static Data Race Detection.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

OpenMP as a High-Level Specification Language for Parallelism - And its use in Evaluating Parallel Programming Systems.
Proceedings of the OpenMP: Memory, Devices, and Tasks, 2016

Efficient Checkpointing of Multi-threaded Applications as a Tool for Debugging, Performance Tuning, and Resiliency.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Declarative Tuning for Locality in Parallel Programs.
Proceedings of the 45th International Conference on Parallel Processing, 2016

The Open Community Runtime: A runtime system for extreme scale computing.
Proceedings of the 2016 IEEE High Performance Extreme Computing Conference, 2016

SWAT: A Programmable, In-Memory, Distributed, High-Performance Computing Platform.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Automatic data layout generation and kernel mapping for CPU+GPU architectures.
Proceedings of the 25th International Conference on Compiler Construction, 2016

Tree-based Read-only Data Chunks for NVRAM Programming.
Proceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing, 2016

2015
JPF Verification of Habanero Java Programs using Gradual Type Permission Regions.
ACM SIGSOFT Softw. Eng. Notes, 2015

The Eureka Programming Model for Speculative Task Parallelism (Artifact).
Dagstuhl Artifacts Ser., 2015

Finding Tizen security bugs through whole-system static analysis.
CoRR, 2015

LLVM-based communication optimizations for PGAS programs.
Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

Auto-grading for parallel programs.
Proceedings of the Workshop on Education for High-Performance Computing, 2015

Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection.
Proceedings of the Principles and Practices of Programming on The Java Platform, 2015

HJ-OpenCL: Reducing the Gap Between the JVM and Accelerators.
Proceedings of the Principles and Practices of Programming on The Java Platform, 2015

Parallelizing a discrete event simulation application using the Habanero-Java multicore library.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

Exploiting parallelism in mobile devices.
Proceedings of the Companion Proceedings of the 2015 ACM SIGPLAN International Conference on Systems, 2015

Polyhedral Optimizations for a Data-Flow Graph Language.
Proceedings of the Languages and Compilers for Parallel Computing, 2015

Model Checking Task Parallel Programs Using Gradual Permissions (N).
Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering, 2015

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015

Heterogeneous work-stealing across CPU and DSP cores.
Proceedings of the 2015 IEEE High Performance Extreme Computing Conference, 2015

Data Layout Optimization for Portable Performance.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Elastic Tasks: Unifying Task Parallelism and SPMD Parallelism with an Adaptive Runtime.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

A Composable Deadlock-Free Approach to Object-Based Isolation.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

Load Balancing Prioritized Tasks via Work-Stealing.
Proceedings of the Euro-Par 2015: Parallel Processing, 2015

The Eureka Programming Model for Speculative Task Parallelism.
Proceedings of the 29th European Conference on Object-Oriented Programming, 2015

Compiling and Optimizing Java 8 Programs for GPU Execution.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Polyhedral Optimizations of Explicitly Parallel Programs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

Extending Polyhedral Model for Analysis and Transformation of OpenMP Programs.
Proceedings of the 2015 International Conference on Parallel Architectures and Compilation, 2015

2014
Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations.
Proceedings of the International Conference for High Performance Computing, 2014

Habanero-Java library: a Java 8 framework for multicore programming.
Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, 2014

Test-driven repair of data races in structured parallel programs.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Exploiting Implicit Parallelism in Dynamic Array Programming Languages.
Proceedings of the ARRAY'14: Proceedings of the 2014 ACM SIGPLAN International Workshop on Libraries, 2014

HabaneroUPC++: a Compiler-free PGAS Library.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

HJ-Viz: a new tool for visualizing, debugging and optimizing parallel programs.
Proceedings of the SPLASH'14, 2014

Cooperative Scheduling of Parallel Tasks with General Synchronization Patterns.
Proceedings of the ECOOP 2014 - Object-Oriented Programming - 28th European Conference, Uppsala, Sweden, July 28, 2014

Inter-iteration Scalar Replacement Using Array SSA Form.
Proceedings of the Compiler Construction - 23rd International Conference, 2014

Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries.
Proceedings of the 4th International Workshop on Programming based on Actors Agents & Decentralized Control, 2014

Selectors: Actors with Multiple Guarded Mailboxes.
Proceedings of the 4th International Workshop on Programming based on Actors Agents & Decentralized Control, 2014

Bounded memory scheduling of dynamic task graphs.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

ADHA: automatic data layout framework for heterogeneous architectures.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
A Transformation Framework for Optimizing Task-Parallel Programs.
ACM Trans. Program. Lang. Syst., 2013

A decoupled non-SSA global register allocation using bipartite liveness graphs.
ACM Trans. Archit. Code Optim., 2013

Automatic detection of inter-application permission leaks in Android applications.
IBM J. Res. Dev., 2013

Accelerating Habanero-Java programs with OpenCL generation.
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, 2013

Isolation for nested task parallelism.
Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages & Applications, 2013

Speculative Execution of Parallel Programs with Precise Exception Semantics on GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

Expressing DOACROSS Loop Dependences in OpenMP.
Proceedings of the OpenMP in the Era of Low Power Devices and Accelerators, 2013

HadoopCL: MapReduce on Distributed Heterogeneous Platforms through Seamless Integration of Hadoop and OpenCL.
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013

Integrating Asynchronous Task Parallelism with MPI.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Compiler-Driven Data Layout Transformation for Heterogeneous Platforms.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Interprocedural strength reduction of critical sections in explicitly-parallel programs.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

2012
Efficient data race detection for async-finish parallelism.
Formal Methods Syst. Des., 2012

Design, verification and applications of a new read-write lock algorithm.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Scalable and precise dynamic datarace detection for structured parallelism.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012

Integrating task parallelism with actors.
Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2012

Mapping a data-flow programming model onto heterogeneous platforms.
Proceedings of the SIGPLAN/SIGBED Conference on Languages, 2012

Finish Accumulators: An Efficient Reduction Construct for Dynamic Task Parallelism.
Proceedings of the Languages and Compilers for Parallel Computing, 2012

A Practical Approach to DOACROSS Parallelization.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Folding of Tagged Single Assignment Values for Memory-Efficient Parallelism.
Proceedings of the Euro-Par 2012 Parallel Processing - 18th International Conference, 2012

Practical Permissions for Race-Free Parallelism.
Proceedings of the ECOOP 2012 - Object-Oriented Programming, 2012

Analytical Bounds for Optimal Tile Size Selection.
Proceedings of the Compiler Construction - 21st International Conference, 2012

2011
Concurrent Collections Programming Model.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Customizable Domain-Specific Computing.
IEEE Des. Test Comput., 2011

Permission Regions for Race-Free Parallelism.
Proceedings of the Runtime Verification - Second International Conference, 2011

Integrating MPI with Asynchronous Task Parallelism.
Proceedings of the Recent Advances in the Message Passing Interface, 2011

DrHJ: a lightweight pedagogic IDE for Habanero Java.
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, 2011

Habanero-Java: the new adventures of old X10.
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, 2011

Intermediate language extensions for parallelism.
Proceedings of the SPLASH'11 Workshops, 2011

Delegated isolation.
Proceedings of the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

DrHJ: the cure to your multicore programming woes.
Proceedings of the Companion to the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

The design and implementation of the habanero-java parallel programming language.
Proceedings of the Companion to the 26th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2011

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System.
Proceedings of the Languages and Compilers for Parallel Computing, 2011

Unifying Barrier and Point-to-Point Synchronization in OpenMP with Phasers.
Proceedings of the OpenMP in the Petascale Era - 7th International Workshop on OpenMP, 2011

Communication Optimizations for Distributed-Memory X10 Programs.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Data-Driven Tasks and Their Implementation.
Proceedings of the International Conference on Parallel Processing, 2011

Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures.
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011

CnC-Hadoop: a graphical coordination language for distributed multiscale parallelism.
Proceedings of the 8th Conference on Computing Frontiers, 2011

Subregion Analysis and Bounds Check Elimination for High Level Arrays.
Proceedings of the Compiler Construction - 20th International Conference, 2011

2010
Concurrent Collections.
Sci. Program., 2010

Building confidence in multicore software.
Commun. ACM, 2010

Automatic Verification of Determinism for Structured Parallel Programs.
Proceedings of the Static Analysis - 17th International Symposium, 2010

SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems.
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2010

Comparing the usability of library vs. language approaches to task parallelism.
Proceedings of the 2nd ACM SIGPLAN Workshop on Evaluation and Usability of Programming Languages and Tools, 2010

Efficient Selection of Vector Instructions Using Dynamic Programming.
Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, 2010

CnC-CUDA: Declarative Programming for GPUs.
Proceedings of the Languages and Compilers for Parallel Computing, 2010

Hierarchical phasers for scalable synchronization and reductions in dynamic parallelism.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

SLAW: A scalable locality-aware adaptive work-stealing scheduler.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Extreme scale computing: Challenges and opportunities.
Proceedings of the 16th International Conference on High-Performance Computer Architecture (HPCA-16 2010), 2010

A Study of a Software Cache Implementation of the OpenMP Memory Model for Multicore and Manycore Architectures.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Reducing task creation and termination overhead in explicitly parallel programs.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

Automatic vector instruction selection for dynamic compilation.
Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 2010

2009
Programmability Issues.
Int. J. High Perform. Comput. Appl., 2009

Declarative aspects of memory management in the concurrent collections parallel programming model.
Proceedings of the POPL 2009 Workshop on Declarative Aspects of Multicore Programming, 2009

The habanero multicore software research project.
Proceedings of the Companion to the 24th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2009

Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement.
Proceedings of the Languages and Compilers for Parallel Computing, 2009

Phaser accumulators: A new reduction construct for dynamic parallelism.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Work-first and help-first scheduling policies for async-finish task parallelism.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Chunking parallel loops in the presence of synchronization.
Proceedings of the 23rd international conference on Supercomputing, 2009

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

Challenges in Code Optimization of Parallel Programs.
Proceedings of the Compiler Construction, 18th International Conference, 2009

Interprocedural Load Elimination for Dynamic Optimization of Parallel Programs.
Proceedings of the PACT 2009, 2009

2008
Type inference for locality analysis of distributed data structures.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

Minimum Lock Assignment: A Method for Exploiting Concurrency among Critical Sections.
Proceedings of the Languages and Compilers for Parallel Computing, 2008

Array optimizations for parallel implementations of high productivity languages.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Code optimization of parallel programs: evolutionary vs. revolutionary approaches.
Proceedings of the Sixth International Symposium on Code Generation and Optimization (CGO 2008), 2008

2007
Deadlock-free scheduling of X10 computations with bounded resources.
Proceedings of the SPAA 2007: Proceedings of the 19th Annual ACM Symposium on Parallelism in Algorithms and Architectures, 2007

Optimized lock assignment and allocation: a method for exploiting concurrency among critical sections.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

X10: concurrent programming for modern architectures.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

May-happen-in-parallel analysis of X10 programs.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Language Extensions in Support of Compiler Parallelization.
Proceedings of the Languages and Compilers for Parallel Computing, 2007

Programming Challenges for Petascale and Multicore Parallel Systems.
Proceedings of the High Performance Computing and Communications, 2007

Optimizing Array Accesses in High Productivity Languages.
Proceedings of the High Performance Computing and Communications, 2007

Extended Linear Scan: An Alternate Foundation for Global Register Allocation.
Proceedings of the Compiler Construction, 16th International Conference, 2007

2006
The Role of Programming Languages in Future Data-Centric and Net-Centric Applications.
Proceedings of the Distributed Computing and Internet Technology, 2006

Enhanced Bitwidth-Aware Register Allocation.
Proceedings of the Compiler Construction, 15th International Conference, 2006

2005
The Jikes Research Virtual Machine project: Building an open-source research community.
IBM Syst. J., 2005

Immutability specification and its applications.
Concurr. Pract. Exp., 2005

XJ: facilitating XML processing in Java.
Proceedings of the 14th international conference on World Wide Web, 2005

X10: an object-oriented approach to non-uniform cluster computing.
Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2005

2004
XJ: integration of XML processing into java.
Proceedings of the 13th international conference on World Wide Web, 2004

Decentralizing execution of composite web services.
Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, 2004

2003
PPPJ 2003: invited talk.
Proceedings of the 2nd International Symposium on Principles and Practice of Programming in Java, 2003

Integrating Database and Programming Language Constraints.
Proceedings of the Database Programming Languages, 9th International Workshop, 2003

2002
Efficient and Precise Datarace Detection for Multithreaded Object-Oriented Programs.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

2001
Reducing the overhead of dynamic compilation.
Softw. Pract. Exp., 2001

Optimized Unrolling of Nested Loops.
Int. J. Parallel Program., 2001

Program analysis for safety guarantees in a Java virtual machine written in Java.
Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis For Software Tools and Engineering, 2001

Dynamic Optimistic Interprocedural Analysis: A Framework and an Application.
Proceedings of the 2001 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 2001

Register-sensitive selection, duplication, and sequencing of instructions.
Proceedings of the 15th international conference on Supercomputing, 2001

High-Performance Scalable Java Virtual Machines.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

Efficient Dependence Analysis for Java Arrays.
Proceedings of the Euro-Par 2001: Parallel Processing, 2001

2000
Location Consistency-A New Memory Model and Cache Consistency Protocol.
IEEE Trans. Computers, 2000

Lightweight Object-Oriented Shared Variables for Cluster Computing in Java.
J. Parallel Distributed Comput., 2000

The Jalapeño virtual machine.
IBM Syst. J., 2000

Unified Analysis of Array and Object References in Strongly Typed Languages.
Proceedings of the Static Analysis, 7th International Symposium, 2000

ABCD: eliminating array bounds checks on demand.
Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2000

An analytical model for loop tiling and its solution.
Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software, 2000

Dynamic compilation in Jalapeño (Panel Session).
Proceedings of ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (Dynamo 2000), 2000

A comparative study of static and profile-based heuristics for inlining.
Proceedings of ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (Dynamo 2000), 2000

1999
Linear scan register allocation.
ACM Trans. Program. Lang. Syst., 1999

Efficient and precise modeling of exceptions for the analysis of Java programs.
ACM SIGSOFT Softw. Eng. Notes, 1999

Compilation techniques for parallel systems.
Parallel Comput., 1999

Dependence Analysis for Java.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

The Jalapeño Dynamic Optimizing Compiler for Java.
Proceedings of the ACM 1999 Conference on Java Grande, JAVA '99, San Francisco, CA, USA, 1999

1998
Enabling Sparse Constant Propagation of Array Elements via Array SSA Form.
Proceedings of the Static Analysis, 5th International Symposium, 1998

Array SSA Form and Its Use in Parallelization.
Proceedings of the POPL '98, 1998

Lightweight Object-Oriented Shared Variables for Distributed Applications on the Internet.
Proceedings of the 1998 ACM SIGPLAN Conference on Object-Oriented Programming Systems, 1998

Loop Transformations for Hierarchical Parallelism and Locality.
Proceedings of the Languages, 1998

Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 1998

Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine.
Proceedings of the ASPLOS-VIII Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 1998

1997
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers.
IBM J. Res. Dev., 1997

Baring It All to Software: Raw Machines.
Computer, 1997

Optimal Weighted Loop Fusion for Parallel Programs.
Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures, 1997

Analysis and Optimization of Explicitly Parallel Programs Using the Parallel Program Graph Representation.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

On the Importance of an End-To-End View of Memory Consistency in Future Computer Systems.
Proceedings of the High Performance Computing, International Symposium, 1997

False Sharing Elimination by Selection of Runtime Scheduling Parameters.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

1996
Anticipatory Instruction Scheduling.
Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures, 1996

Locality Analysis for Distributed Shared-Memory Multiprocessors.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

Incremental Computation of Static Single Assignment Form.
Proceedings of the Compiler Construction, 6th International Conference, 1996

Automatic parallelization for symmetric shared-memory multiprocessors.
Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative Research, 1996

1995
Scheduling Iterative Task Computation on Message-Passing Architectures.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Mapping Iterative Task Graphs on Distributed Memory Machines.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

Location Consistency: Stepping Beyond the Memory Coherence Barrier.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

Optimized code restructuring of OS/2 executables.
Proceedings of the 1995 Conference of the Centre for Advanced Studies on Collaborative Research, 1995

1994
Programming, compilation, and resource management issues for multithreading (panel session II).
SIGARCH Comput. Archit. News, 1994

Is there a future for functional languages in parallel programming?
Proceedings of the IEEE Computer Society 1994 International Conference on Computer Languages, 1994

An Optimal Asynchronous Scheduling Algorithm for Software Cache Consistence.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

Automatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework.
Proceedings of the 27th Annual Hawaii International Conference on System Sciences (HICSS-27), 1994

A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness.
Proceedings of the 1994 Conference of the Centre for Advanced Studies on Collaborative Research, October 31, 1994

1993
Parallel Program Graphs and their Classification.
Proceedings of the Languages and Compilers for Parallel Computing, 1993

1992
A General Framework for Iteration-Reordering Loop Transformations.
Proceedings of the ACM SIGPLAN'92 Conference on Programming Language Design and Implementation (PLDI), 1992

A Concurrent Execution Semantics for Parallel Program Graphs and Program Dependence Graphs.
Proceedings of the Languages and Compilers for Parallel Computing, 1992

Collective Loop Fusion for Array Contraction.
Proceedings of the Languages and Compilers for Parallel Computing, 1992

1991
Automatic partitioning of a program dependence graph into parallel tasks.
IBM J. Res. Dev., 1991

On Estimating and Enhancing Cache Effectiveness.
Proceedings of the Languages and Compilers for Parallel Computing, 1991

Optimization of array accesses by collective loop transformations.
Proceedings of the 5th international conference on Supercomputing, 1991

1990
Instruction Reordering for Fork-Join Parallelism.
Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), 1990

Compact Representations for Control Dependence.
Proceedings of the ACM SIGPLAN'90 Conference on Programming Language Design and Implementation (PLDI), 1990

POSC - a partitioning and optimizing SISAL compiler.
Proceedings of the 4th international conference on Supercomputing, 1990

1989
Determining Average Program Execution Times and their Variance.
Proceedings of the ACM SIGPLAN'89 Conference on Programming Language Design and Implementation (PLDI), 1989

1988
Automatic Discovery of Parallelism: A Tool and an Experiment (Extended Abstract).
Proceedings of the ACM/SIGPLAN PPEALS 1988, 1988

A Simple and Efficient Implmentation Approach for Single Assignment Languages.
Proceedings of the 1988 ACM Conference on LISP and Functional Programming, 1988

Synchronization using counting semaphores.
Proceedings of the 2nd international conference on Supercomputing, 1988

Processor Scheduling Algorithms for Constraint-Satisfaction Search Problems.
Proceedings of the International Conference on Parallel Processing, 1988

1986
Compile-time partitioning and scheduling of parallel programs.
Proceedings of the 1986 SIGPLAN Symposium on Compiler Construction, 1986

Partitioning Parallel Programs for Macro-Dataflow.
Proceedings of the 1986 ACM Conference on LISP and Functional Programming, 1986


  Loading...