J. Ramanujam

Orcid: 0000-0002-4349-1327

  • Louisiana State University, Baton Rouge, LA, USA

According to our database1, J. Ramanujam authored at least 177 papers between 1988 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Insights from Augmented Data Integration and Strong Regularization in Drug Synergy Prediction with SynerGNet.
Mach. Learn. Knowl. Extr., September, 2024

Deep video representation learning: a survey.
Multim. Tools Appl., June, 2024

Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach.
CoRR, 2024

Learning Dynamic Representations in Large Language Models for Evolving Data Streams.
Proceedings of the Pattern Recognition - 27th International Conference, 2024

GraphDTI: A robust deep learning predictor of drug-target interactions from multiple heterogeneous data.
J. Cheminformatics, 2021

BionoiNet: ligand-binding site classification with off-the-shelf deep neural network.
Bioinform., 2020

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modeling.
ACM Trans. Math. Softw., 2019

Toward a more dependable hybrid analysis of android malware using aspect-oriented programming.
Comput. Secur., 2018

Gaslight: A comprehensive fuzzing architecture for memory forensics frameworks.
Digit. Investig., 2017

Automated Tiling of Unstructured Mesh Computations with Application to Seismological Modelling.
CoRR, 2017

HPX Smart Executors.
Proceedings of the Third International Workshop on Extreme Scale Programming Models and Middleware, 2017

Redesigning OP2 Compiler to Use HPX Runtime Asynchronous Techniques.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

A Load-Balanced Parallel and Distributed Sorting Algorithm Implemented with PGX.D.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, 2017

Improving the Parallel Performance of an NBody Application Using Adaptive Techniques in HPX.
Proceedings of the 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, 2017

Assessing the similarity of ligand binding conformations with the Contact Mode Score.
Comput. Biol. Chem., 2016

A Massively Parallel Distributed N-body Application Implemented with HPX.
Proceedings of the 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, 2016

Effective padding of multidimensional arrays to avoid cache conflict misses.
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016

Using HPX and OP2 for Improving Parallel Scaling Performance of Unstructured Grid Applications.
Proceedings of the 45th International Conference on Parallel Processing Workshops, 2016

Introduction to the Special Issue on PPoPP'12.
ACM Trans. Parallel Comput., 2015

GeauxDock: A novel approach for mixed-resolution ligand docking using a descriptor-based force field.
J. Comput. Chem., 2015

SDSLc: a multi-target domain-specific compiler for stencil computations.
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015

Lost in heterogeneity: architectural selection based on code features.
Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing, 2015

Distributed memory code generation for mixed Irregular/Regular computations.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

On Characterizing the Data Access Complexity of Programs.
Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2015

Optimistic Delinearization of Parametrically Sized Arrays.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Automatic parallelization of a class of irregular loops for distributed memory systems.
ACM Trans. Parallel Comput., 2014

Cross-Loop Optimization of Arithmetic Intensity for Finite Element Local Assembly.
ACM Trans. Archit. Code Optim., 2014

On Using the Roofline Model with Lower Bounds on Data Movement.
ACM Trans. Archit. Code Optim., 2014

Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
J. Parallel Distributed Comput., 2014

Parallel tempering simulation of the three-dimensional Edwards-Anderson model with compact asynchronous multispin coding on GPU.
Comput. Phys. Commun., 2014

COFFEE: an Optimizing Compiler for Finite Element Local Assembly.
CoRR, 2014

DA-TC: a novel application execution model in multicluster systems.
Clust. Comput., 2014

On characterizing the data movement complexity of computational DAGs for parallel execution.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014

A framework for enhancing data reuse via associative reordering.
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014

Generalizing Run-Time Tiling with the Loop Chain Abstraction.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential.
ACM Trans. Archit. Code Optim., 2013

Adaptive parallel tiled code generation and accelerated auto-tuning.
Int. J. High Perform. Comput. Appl., 2013

Parametric GPU Code Generation for Affine Loop Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 2013

A stencil compiler for short-vector SIMD architectures.
Proceedings of the International Conference on Supercomputing, 2013

Split tiling for GPUs: automatic parallelization using trapezoidal tiles.
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013

An ILP solution to address code generation for embedded applications on digital signal processors.
ACM Trans. Design Autom. Electr. Syst., 2012

Storage Optimization through Offset Assignment with Variable Coalescing.
ACM Trans. Embed. Comput. Syst., 2012

An Effective Solution to Task Scheduling and Memory Partitioning for Multiprocessor System-on-Chip.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2012

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.
J. Parallel Distributed Comput., 2012

Code Size Reduction for Array Intensive Applications on Digital Signal Processors.
J. Circuits Syst. Comput., 2012

Code generation for parallel execution of a class of irregular loops on distributed memory systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Analytical Bounds for Optimal Tile Size Selection.
Proceedings of the Compiler Construction - 21st International Conference, 2012

Loop transformations: convexity, pruning and optimization.
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011

Dynamic selection of tile sizes.
Proceedings of the 18th International Conference on High Performance Computing, 2011

Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures.
Proceedings of the Compiler Construction - 20th International Conference, 2011

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.
Proceedings of the Conference on High Performance Computing Networking, 2010

DynTile: Parametric tiled loop generation for parallel execution on multicore processors.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Parameterized tiling revisited.
Proceedings of the CGO 2010, 2010

Automatic C-to-CUDA Code Generation for Affine Programs.
Proceedings of the Compiler Construction, 19th International Conference, 2010

Decoupling interaction hardware design using libraries of reusable electronics.
Proceedings of the 3rd International Conference on Tangible and Embedded Interaction 2009, 2009

Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

Parametric multi-level tiling of imperfectly nested loops.
Proceedings of the 23rd international conference on Supercomputing, 2009

A Framework for Task Scheduling and Memory Partitioning for Multi-Processor System-on-Chip.
Proceedings of the High Performance Embedded Architectures and Compilers, 2009

An innovative application execution toolkit for multicluster grids.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors.
Proceedings of the PACT 2009, 2009

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008

A practical automatic polyhedral parallelizer and locality optimizer.
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008

Towards effective automatic parallelization for multicore systems.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

A compiler framework for optimization of affine loop nests for gpgpus.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Scheduling DAGs for Fixed-point DSP Processors by Using Worm Partitions.
Proceedings of the International Conference on Embedded Software and Systems, 2008

Address Register Allocation in Digital Signal Processors.
Proceedings of the International Conference on Embedded Software and Systems, 2008

Storage optimization through code size reduction for digital signal processors.
Proceedings of the 6th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008

Optimal address register allocation for arrays in DSP applications.
Proceedings of the 6th IEEE/ACM/IFIP Workshop on Embedded Systems for Real-Time Multimedia, 2008

Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.
Proceedings of the Compiler Construction, 17th International Conference, 2008

Efficient search-space pruning for integrated fusion and tiling transformations.
Concurr. Comput. Pract. Exp., 2007

Code Size Optimization for Embedded Processors using Commutative Transformations.
Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007), 2007

Automatic mapping of nested loops to FPGAS.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Effective automatic parallelization of stencil computations.
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007

Memory Offset Assignment for DSPs.
Proceedings of the Embedded Software and Systems, [Third] International Conference, 2007

Estimating and reducing the memory requirements of signal processing codes for embedded systems.
IEEE Trans. Signal Process., 2006

Improving the energy behavior of block buffering using compiler optimizations.
ACM Trans. Design Autom. Electr. Syst., 2006

Reducing code size through address register assignment.
ACM Trans. Embed. Comput. Syst., 2006

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver.
J. Parallel Distributed Comput., 2006

An Effective Heuristic for Simple Offset Assignment with Variable Coalescing.
Proceedings of the Languages and Compilers for Parallel Computing, 2006

Memory minimization for tensor contractions using integer linear programming.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations.
Proceedings of the Computational Science, 2006

Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models.
Proc. IEEE, 2005

Performance modeling and optimization of parallel out-of-core tensor contractions.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005

Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations.
Proceedings of the Computational Science, 2005

A compiler-based approach for dynamically managing scratch-pad memories in embedded systems.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2004

Empirical Performance-Model Driven Data Layout Optimization.
Proceedings of the Languages and Compilers for High Performance Computing, 2004

Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework.
IEEE Trans. Parallel Distributed Syst., 2003

Memory-Constrained Data Locality Optimization for Tensor Contractions.
Proceedings of the Languages and Compilers for Parallel Computing, 2003

Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms.
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003

Address Register Assignment for Reducing Code Size.
Proceedings of the Compiler Construction, 12th International Conference, 2003

An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets.
J. Supercomput., 2002

Address Code and Arithmetic Optimizations for Embedded Systems.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A Heuristic for Clock Selection in High-Level Synthesis.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

Strategies for Improving Data Locality in Embedded Applications.
Proceedings of the 7th Asia and South Pacific Design Automation Conference (ASP-DAC 2002), 2002

A high-level approach to synthesis of high-performance codes for quantum chemistry.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations.
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002

Memory-Constrained Communication Minimization for a Class of Array Computations.
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002

A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002

Exploiting shared scratch pad memory space in embedded multiprocessor systems.
Proceedings of the 39th Design Automation Conference, 2002

Automatic Data Distribution.
Proceedings of the Compiler Design Handbook: Optimizations and Machine Code Generation, 2002

Static and Dynamic Locality Optimizations Using Integer Linear Programming.
IEEE Trans. Parallel Distributed Syst., 2001

A fast approach to computing exact solutions to the resource-constrained scheduling problem.
ACM Trans. Design Autom. Electr. Syst., 2001

Compact and efficient code generation through program restructuringon limited memory embedded DSPs.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2001

A Layout-Conscious Iteration Space Transformation Technique.
IEEE Trans. Computers, 2001

Data Relation Vectors: A New Abstraction for Data Optimizations.
IEEE Trans. Computers, 2001

Morphable Cache Architectures: Potential Benefits.
Proceedings of the 2001 ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems, 2001

Compiler support for block buffering.
Proceedings of the 2001 International Symposium on Low Power Electronics and Design, 2001

Loop optimization for a class of memory-constrained computations.
Proceedings of the 15th international conference on Supercomputing, 2001

Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001

Reducing Memory Requirements of Nested Loops for Embedded Systems.
Proceedings of the 38th Design Automation Conference, 2001

Dynamic Management of Scratch-Pad Memory Space.
Proceedings of the 38th Design Automation Conference, 2001

Integer Lattice Based Methods for Local Address Generation for Block-Cyclic Distributions.
Proceedings of the Compiler Optimizations for Scalable Parallel Systems Languages, 2001

A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations.
IEEE Trans. Parallel Distributed Syst., 2000

Minimizing Data and Synchronization Costs in One-Way Communication.
IEEE Trans. Parallel Distributed Syst., 2000

Compiler Algorithms for Optimizing Locality and Parallelism on Shared and Distributed-Memory Machines.
J. Parallel Distributed Comput., 2000

Improving Offset Assignment for Embedded Processors.
Proceedings of the Languages and Compilers for Parallel Computing, 2000

Improving Offset Assignment on Embedded Processors Using Transformations.
Proceedings of the High Performance Computing, 2000

On lower bounds for scheduling problems in high-level synthesis.
Proceedings of the 37th Conference on Design Automation, 2000

A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts.
IEEE Trans. Parallel Distributed Syst., 1999

A global communication optimization technique based on data-flow analysis and linear algebra.
ACM Trans. Program. Lang. Syst., 1999

Improving Cache Locality by a Combination of Loop and Data Transformation.
IEEE Trans. Computers, 1999

A Matrix-Based Approach to Global Locality Optimization.
J. Parallel Distributed Comput., 1999

Improving Locality Using a Graph-Based Technique for Detecting Memory Layouts of Arrays.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Code Restructuring for Improving Real Time Response through Code Speed, Size Trade-offs on Limited Memory Embedded DSPs.
Proceedings of the Languages and Compilers for Parallel Computing, 1999

A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

An integer linear programming approach for optimizing cache locality.
Proceedings of the 13th international conference on Supercomputing, 1999

A Framework for Interprocedural Locality Optimization Using Both Loop and Data Layout Transformations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Compiler Optimizations for I/O-Intensive Computations.
Proceedings of the International Conference on Parallel Processing 1999, 1999

Restructuring I/O-Intensive Computations for Locality.
Proceedings of the High-Performance Computing and Networking, 7th International Conference, 1999

I/O-Conscious Tiling for Disk-Resident Data Sets.
Proceedings of the Euro-Par '99 Parallel Processing, 5th International Euro-Par Conference, Toulouse, France, August 31, 1999

On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors.
Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques, 1999

Compilation Techniques for Out-of-Core Parallel Computations.
Parallel Comput., 1998

Locality Optimization Algorithms for Compilation of Out-of-Core Codes.
J. Inf. Sci. Eng., 1998

Partitioning Graphs on Message-Passing Machines by Pairwise Mincut.
Inf. Sci., 1998

Improving Locality Using Loop and Data Transformations in an Integrated Framework.
Proceedings of the 31st Annual IEEE/ACM International Symposium on Microarchitecture, 1998

Improving Locality in Out-of-Core Computations Using Data Layout Transformations.
Proceedings of the Languages, 1998

A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality.
Proceedings of the Languages and Compilers for Parallel Computing, 1998

A Generalized Framework for Global Communication Optimization.
Proceedings of the 12th International Parallel Processing Symposium / 9th Symposium on Parallel and Distributed Processing (IPPS/SPDP '98), March 30, 1998

A Hyperplane Based Approach for Optimizing Spatial Locality in Loop Nests.
Proceedings of the 12th international conference on Supercomputing, 1998

Improving the computational performance of ILP-based problems.
Proceedings of the 1998 IEEE/ACM International Conference on Computer-Aided Design, 1998

Efficient address sequence generation for two-level mappings in High Performance Fortran.
Proceedings of the 5th International Conference On High Performance Computing, 1998

Enhancing Spatial Locality via Data Layout Optimizations.
Proceedings of the Euro-Par '98 Parallel Processing, 1998

A Matrix-Based Approach to the Global Locality Optimization Problem.
Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, 1998

Communication Generation for Block-Cyclic Distributions.
Parallel Process. Lett., 1997

Code Generation for Complex Subscripts in Data-Parallel Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1997

A Unified Compiler Algorithm for Optimizing Locality, Parallelism and Communication in Out-of-core Computations.
Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, 1997

A Compiler Algorithm for Optimizing Locality in Loop Nests.
Proceedings of the 11th international conference on Supercomputing, 1997

Improving the Performance of Out-of-Core Computations.
Proceedings of the 1997 International Conference on Parallel Processing (ICPP '97), 1997

Optimization of Out-of-Core Computations Using Chain Vectors.
Proceedings of the Euro-Par '97 Parallel Processing, 1997

A neural architecture for a class of abduction problems.
IEEE Trans. Syst. Man Cybern. Part B, 1996

Efficient Algorithms for Array Redistribution.
IEEE Trans. Parallel Distributed Syst., 1996

Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors.
J. Parallel Distributed Comput., 1996

Compilation and Communication Strategies for Out-of-Core Programs on Distributed Memory Machines.
J. Parallel Distributed Comput., 1996

Generalized Overlap Regions for Communication Optimization in Data-Parallel Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1996

Automatic Optimization of Communication in Compiling Out-of-Core Stencil Codes.
Proceedings of the 10th international conference on Supercomputing, 1996

A Framework for Integrated Communication and I/O Placement.
Proceedings of the Euro-Par '96 Parallel Processing, 1996

Beyond unimodular transformations.
J. Supercomput., 1995

Mapping combinatorial optimization problems onto neural networks.
Inf. Sci., 1995

Integrating Data Distribution and Loop Transformations.
Proceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing, 1995

Communication Generation and Optimization for HPF.
Proceedings of the Languages, 1995

Fast Address Sequence Generation for Data-Parallel Programs Using Integer Lattices.
Proceedings of the Languages and Compilers for Parallel Computing, 1995

Statement-level independent partitioning of uniform recurrences.
Proceedings of IPPS '95, 1995

Multi-phase array redistribution: modeling and evaluation.
Proceedings of IPPS '95, 1995

Analysis of Event Synchronization in Parallel Programs.
Proceedings of the Languages and Compilers for Parallel Computing, 1994

Optimal Software Pipelining of Nested Loops.
Proceedings of the 8th International Symposium on Parallel Processing, 1994

Tiling Multidimensional Itertion Spaces for Multicomputers.
J. Parallel Distributed Comput., 1992

Non-Unimodular Transformations of Nested Loops.
Proceedings of the Proceedings Supercomputing '92, 1992

Compile-Time Techniques for Data Distribution in Distributed Memory Machines.
IEEE Trans. Parallel Distributed Syst., 1991

Tiling multidimensional iteration spaces for nonshared memory machines.
Proceedings of the Proceedings Supercomputing '91, 1991

A Linear Algebraic View of Loop Transformations and Their Interaction.
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing, 1991

Cluster partitioning approaches to mapping parallel programs onto a hypercube.
Parallel Comput., 1990

Task Allocation onto a Hypercube by Recursive Mincut Bipartitioning.
J. Parallel Distributed Comput., 1990

Tiling of Iteration Spaces for Multicomputers.
Proceedings of the 1990 International Conference on Parallel Processing, 1990

A methodology for parallelizing programs for multicomputers and complex memory multiprocessors.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989

Optimization by neural networks.
Proceedings of International Conference on Neural Networks (ICNN'88), 1988

Towards a 'neural' architecture for abductive reasoning.
Proceedings of International Conference on Neural Networks (ICNN'88), 1988
