P. Sadayappan
Orcid: 0000-0002-4737-2034Affiliations:
- University of Utah, Salt Lake City, UT, USA
- Ohio State University, Columbus, USA (former)
According to our database1,
P. Sadayappan
authored at least 347 papers
between 1985 and 2024.
Collaborative distances:
Collaborative distances:
Awards
IEEE Fellow
IEEE Fellow 2015, "For contributions to parallel programming tools for high-performance computing".
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
-
on cs.utah.edu
On csauthors.net:
Bibliography
2024
An Empirical Investigation of Matrix Factorization Methods for Pre-trained Transformers.
CoRR, 2024
Proceedings of the 38th ACM International Conference on Supercomputing, 2024
2023
ACM Trans. Archit. Code Optim., June, 2023
Multi-discretization domain specific language and code generation for differential equations.
J. Comput. Sci., April, 2023
Automating GPU Scalability for Complex Scientific Models: Phonon Boltzman Transport Equation.
CoRR, 2023
Proceedings of the International Conference for High Performance Computing, 2023
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023
TDC: Towards Extremely Efficient CNNs on GPUs via Hardware-Aware Tucker Decomposition.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the 37th International Conference on Supercomputing, 2023
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023
2022
Dagstuhl Reports, 2022
Proceedings of the SC22: International Conference for High Performance Computing, 2022
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Comprehensive Accelerator-Dataflow Co-design Optimization for Convolutional Neural Networks.
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022
Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution.
Proceedings of the CC '22: 31st ACM SIGPLAN International Conference on Compiler Construction, Seoul, South Korea, April 2, 2022
Effective Performance Modeling and Domain-Specific Compiler Optimization of CNNs for GPUs.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, 2022
2021
Proceedings of the SPAA '21: 33rd ACM Symposium on Parallelism in Algorithms and Architectures, 2021
Proceedings of the PLDI '21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2021
Proceedings of the ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2021
2020
Proceedings of the International Conference for High Performance Computing, 2020
Scalable heterogeneous execution of a coupled-cluster model with perturbative triples.
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the International Conference for High Performance Computing, 2020
Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, 2020
Proceedings of the KDD '20: The 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020
2019
Proceedings of the International Conference for High Performance Computing, 2019
Parallel Data-Local Training for Optimizing Word2Vec Embeddings for Word and Graph Embeddings.
Proceedings of the 2019 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2019
Proceedings of the International Conference for High Performance Computing, 2019
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019
Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019
2018
Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations.
Proc. IEEE, 2018
Proc. ACM Program. Lang., 2018
Proceedings of the International Conference for High Performance Computing, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018
Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2018
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Effective Machine Learning Based Format Selection and Performance Modeling for SpMV on GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018
Proceedings of the 32nd International Conference on Supercomputing, 2018
Proceedings of the Computational Science - ICCS 2018, 2018
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018
Proceedings of the 25th IEEE International Conference on High Performance Computing, 2018
2017
Optimizing the Four-Index Integral Transform Using Data Movement Lower Bounds Analysis.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2017
Proceedings of the General Purpose GPUs, 2017
Proceedings of the Languages and Compilers for Parallel Computing, 2017
Proceedings of the International Conference on Supercomputing, 2017
Proceedings of the 24th IEEE International Conference on High Performance Computing Workshops, 2017
Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs.
Proceedings of the 24th IEEE International Conference on High Performance Computing, 2017
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017
2016
ACM Trans. Archit. Code Optim., 2016
Global-view coefficients: a data management solution for parallel quantum Monte Carlo applications.
Concurr. Comput. Pract. Exp., 2016
Work stealing for GPU-accelerated parallel programs in a global address space framework.
Concurr. Comput. Pract. Exp., 2016
Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, 2016
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment.
Proceedings of the International Conference for High Performance Computing, 2016
PIPES: a language and compiler for task-based programming on distributed-memory clusters.
Proceedings of the International Conference for High Performance Computing, 2016
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs.
Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, 2016
PolyCheck: dynamic verification of iteration space transformations on affine programs.
Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2016
Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2016
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
Proceedings of the 23rd IEEE International Conference on High Performance Computing, 2016
Proceedings of the 25th International Conference on Compiler Construction, 2016
Register allocation and promotion through combined instruction scheduling and loop unrolling.
Proceedings of the 25th International Conference on Compiler Construction, 2016
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016
2015
A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs.
J. Parallel Distributed Comput., 2015
Proceedings of the 5th International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, 2015
Proceedings of the International Conference for High Performance Computing, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015
Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015
Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2015
2014
Automatic parallelization of a class of irregular loops for distributed memory systems.
ACM Trans. Parallel Comput., 2014
ACM Trans. Archit. Code Optim., 2014
ACM Trans. Archit. Code Optim., 2014
Parallel Process. Lett., 2014
Introduction to the JPDC Special Issue on Domain-Specific Languages and High-Level Frameworks for High-Performance Computing.
J. Parallel Distributed Comput., 2014
On characterizing the data movement complexity of computational DAGs for parallel execution.
Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the International Conference for High Performance Computing, 2014
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014
Proceedings of the SPLASH'14, 2014
An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs.
Proceedings of the 2014 International Conference on Supercomputing, 2014
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014
Proceedings of the 43rd International Conference on Parallel Processing Workshops, 2014
Proceedings of the 43rd International Conference on Parallel Processing, 2014
Proceedings of the 21st International Conference on High Performance Computing, 2014
Proceedings of the 12th Annual IEEE/ACM International Symposium on Code Generation and Optimization, 2014
Proceedings of the 14th IEEE/ACM International Symposium on Cluster, 2014
Proceedings of the 2014 IEEE International Conference on Big Data (IEEE BigData 2014), 2014
2013
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential.
ACM Trans. Archit. Code Optim., 2013
Int. J. Parallel Program., 2013
Int. J. High Perform. Comput. Appl., 2013
A framework for load balancing of tensor contraction expressions via dynamic task partitioning.
Proceedings of the International Conference for High Performance Computing, 2013
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2013
Proceedings of the Languages and Compilers for Parallel Computing, 2013
Proceedings of the 2013 IEEE International Symposium on Parallel & Distributed Processing, 2013
Proceedings of the International Conference on Supercomputing, 2013
Stratification driven placement of complex data: A framework for distributed data analytics.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013
Proceedings of the 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, 2013
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, 2013
2012
ACM Trans. Archit. Code Optim., 2012
Proceedings of the International Conference on Computational Science, 2012
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions.
J. Parallel Distributed Comput., 2012
Code generation for parallel execution of a class of irregular loops on distributed memory systems.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Proceedings of the 2012 SC Companion: High Performance Computing, 2012
Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, 2012
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Load Balancing of Dynamical Nucleation Theory Monte Carlo Simulations through Resource Sharing Barriers.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012
Proceedings of the International Conference on Supercomputing, 2012
A global address space approach to automated data management for parallel Quantum Monte Carlo applications.
Proceedings of the 19th International Conference on High Performance Computing, 2012
Proceedings of the Compiler Construction - 21st International Conference, 2012
High-performance sparse matrix-vector multiplication on GPUs for structured grid computations.
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, 2012
2011
Proc. VLDB Endow., 2011
Parallel Comput., 2011
Comput. Lang. Syst. Struct., 2011
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011
Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2011
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Proceedings of the 18th International Conference on High Performance Computing, 2011
Proceedings of the Euro-Par 2011 Parallel Processing - 17th International Conference, 2011
Proceedings of the CGO 2011, 2011
Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures.
Proceedings of the Compiler Construction - 20th International Conference, 2011
Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, 2011
2010
Parameterized specification, configuration and execution of data-intensive scientific workflows.
Clust. Comput., 2010
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework.
Proceedings of the Conference on High Performance Computing Networking, 2010
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
DynTile: Parametric tiled loop generation for parallel execution on multicore processors.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010
Proceedings of the 26th IEEE International Conference on Software Maintenance (ICSM 2010), 2010
Proceedings of the 39th International Conference on Parallel Processing, 2010
Proceedings of the 7th Conference on Computing Frontiers, 2010
Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010
Proceedings of the Compiler Construction, 19th International Conference, 2010
2009
An Integrated Approach to Locality-Conscious Processor Allocation and Scheduling of Mixed-Parallel Applications.
IEEE Trans. Parallel Distributed Syst., 2009
Enabling software management for multicore caches with a lightweight hardware support.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009
Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Proceedings of the 23rd international conference on Supercomputing, 2009
Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, 2009
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009
Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning.
Proceedings of the PACT 2009, 2009
Proceedings of the PACT 2009, 2009
2008
Simul. Model. Pract. Theory, 2008
A framework for characterizing overlap of communication and computation in parallel applications.
Clust. Comput., 2008
Global trees: a framework for linked data structures on distributed memory parallel systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008
Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories.
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008
Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, 2008
A dynamic scheduling approach for coordinated wide-area data transfers using GridFTP.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008
A Duplication Based Algorithm for Optimizing Latency Under Throughput Constraints for Streaming Workflows.
Proceedings of the 2008 International Conference on Parallel Processing, 2008
Proceedings of the 2008 International Conference on Parallel Processing, 2008
Proceedings of the Computational Science, 2008
Multi-hop path splitting and multi-pathing optimizations for data transfers over shared wide-area networks using gridFTP.
Proceedings of the 17th International Symposium on High-Performance Distributed Computing (HPDC-17 2008), 2008
Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems.
Proceedings of the 14th International Conference on High-Performance Computer Architecture (HPCA-14 2008), 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
Proceedings of the 2008 IEEE International Conference on Cluster Computing, 29 September, 2008
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model.
Proceedings of the Compiler Construction, 17th International Conference, 2008
2007
Concurr. Comput. Pract. Exp., 2007
Proceedings of the ACM/IEEE Conference on High Performance Networking and Computing, 2007
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007
Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007
A global address space framework for locality aware scheduling of block-sparse computations.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007
Toward Optimizing Latency Under Throughput Constraints for Application Workflows on Clusters.
Proceedings of the Euro-Par 2007, 2007
Proceedings of the Euro-Par 2007, 2007
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007
2006
J. Supercomput., 2006
MOLAR: adaptive runtime support for high-end computing operating and runtime systems.
ACM SIGOPS Oper. Syst. Rev., 2006
J. Parallel Distributed Comput., 2006
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Data management and query - Hypergraph partitioning for automatic memory hierarchy management.
Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, 2006
Proceedings of the Languages and Compilers for Parallel Computing, 2006
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006
A Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing.
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2006
An approach to locality-conscious load balancing and transparent memory hierarchy management with a global-address-space parallel programming model.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
An extensible global address space framework with decoupled task and data abstractions.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006
An Integrated Approach for Processor Allocation and Scheduling of Mixed-Parallel Applications.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006
Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations.
Proceedings of the Computational Science, 2006
Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, 2006
Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM 2006), 2006
Locality Conscious Processor Allocation and Scheduling for Mixed Parallel Applications.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006
A Performance Instrumentation Framework to Characterize Computation-Communication Overlap in Message-Passing Systems.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006
Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques (PACT 2006), 2006
2005
Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models.
Proc. IEEE, 2005
Int. J. High Perform. Comput. Netw., 2005
Integrated Loop Optimizations for Data Locality Enhancement of Tensor Contraction Expressions.
Proceedings of the ACM/IEEE SC2005 Conference on High Performance Networking and Computing, 2005
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2005
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2005
Cache Miss Characterization and Data Locality Optimization for Imperfectly Nested Loops on Shared Memory Multiprocessors.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005
Proceedings of the 34th International Conference on Parallel Processing Workshops (ICPP 2005 Workshops), 2005
Automated Operation Minimization of Tensor Contraction Expressions in Electronic Structure Calculations.
Proceedings of the Computational Science, 2005
Proceedings of the 14th IEEE International Symposium on High Performance Distributed Computing, 2005
Proceedings of the High Performance Computing, 2005
A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O.
Proceedings of the 5th International Symposium on Cluster Computing and the Grid (CCGrid 2005), 2005
2004
Int. J. High Perform. Comput. Netw., 2004
Int. J. High Perform. Comput. Netw., 2004
Proceedings of the Languages and Compilers for High Performance Computing, 2004
Proceedings of the 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), 2004
Message from the Chairs: International Workshop on Compile and Run Time Techniques for Parallel Computing.
Proceedings of the 33rd International Conference on Parallel Processing Workshops (ICPP 2004 Workshops), 2004
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004
Proceedings of the High Performance Computing, 2004
Proceedings of the 5th International Workshop on Grid Computing (GRID 2004), 2004
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004
Proceedings of the 2004 IEEE International Conference on Cluster Computing (CLUSTER 2004), 2004
2003
Evaluating the Impact of Programming Language Features on the Performance of Parallel Applications on Cluster Architectures.
Proceedings of the Languages and Compilers for Parallel Computing, 2003
Proceedings of the Languages and Compilers for Parallel Computing, 2003
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2003
Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003
Proceedings of the High Performance Computing - HiPC 2003, 10th International Conference, 2003
Proceedings of the 2003 IEEE International Conference on Cluster Computing (CLUSTER 2003), 2003
2002
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002
Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2002
Proceedings of the Languages and Compilers for Parallel Computing, 15th Workshop, 2002
Proceedings of the Job Scheduling Strategies for Parallel Processing, 2002
A Performance Optimization Framework for Compilation of Tensor Contraction Expressions into Parallel Programs.
Proceedings of the 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 2002
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002
Proceedings of the 31st International Conference on Parallel Processing Workshops (ICPP 2002 Workshops), 2002
Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS'02), 2002
Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002
Proceedings of the High Performance Computing, 2002
Proceedings of the 2002 IEEE International Conference on Cluster Computing (CLUSTER 2002), 2002
2001
Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001
Proceedings of the 15th international conference on Supercomputing, 2001
Proceedings of the 2001 International Conference on Parallel Processing, 2001
Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation.
Proceedings of the 2001 International Conference on Parallel Processing, 2001
Towards Automatic Synthesis of High-Performance Codes for Electronic Structure Calculations: Data Locality Optimization.
Proceedings of the High Performance Computing - HiPC 2001, 8th International Conference, 2001
2000
Characterization and Enhancement of Dynamic Mapping Heuristics for Heterogeneous Systems.
Proceedings of the 2000 International Workshop on Parallel Processing, 2000
Proceedings of the 2000 International Workshop on Parallel Processing, 2000
Characterization and enhancement of Static Mapping Heuristics for Heterogeneous Systems.
Proceedings of the High Performance Computing, 2000
Proceedings of the Network-Based Parallel Computing: Communication, 2000
Proceedings of the Network-Based Parallel Computing: Communication, 2000
1999
Performance Optimization of a Class of Loops Involving Sums of Products of Sparse Arrays.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals.
Proceedings of the Languages and Compilers for Parallel Computing, 1999
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999
An Incremental Methodology for Parallelizing Legacy Stencil Codes on Message-Passing Computers.
Proceedings of the International Conference on Parallel Processing 1999, 1999
Proceedings of the High Performance Computing, 1999
Communication Modeling of Heterogeneous Networks of Workstations for Performance Characterization of Collective Operations.
Proceedings of the 8th Heterogeneous Computing Workshop, 1999
Proceedings of the Network-Based Parallel Computing: Communication, 1999
1998
A technique for overlapping computation and communication for block recursive algorithms.
Concurr. Pract. Exp., 1998
1997
On Optimizing a Class of Multi-Dimensional Loops with Reductions for Parallel Execution.
Parallel Process. Lett., 1997
Optimal Algorithms for All-to-All Personalized Communication on Rings and Two Dimensional Tori.
J. Parallel Distributed Comput., 1997
Optimization of a Class of Multi-Dimensional Integrals on Parallel Machines.
Proceedings of the Eighth SIAM Conference on Parallel Processing for Scientific Computing, 1997
Proceedings of the Fourth International on High-Performance Computing, 1997
1996
Efficient Index Set Generation for Compiling HPF Array Statements on Distributed-Memory Machines.
J. Parallel Distributed Comput., 1996
J. Parallel Distributed Comput., 1996
A Framework for Generating Distributed-Memory Parallel Programs for Block Recursive Algorithms.
J. Parallel Distributed Comput., 1996
J. Inf. Sci. Eng., 1996
Proceedings of the Languages and Compilers for Parallel Computing, 1996
Proceedings of the 10th international conference on Supercomputing, 1996
1995
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction.
Sci. Program., 1995
Parallel Process. Lett., 1995
J. Exp. Theor. Artif. Intell., 1995
Compiling Array Statements for Efficient Execution on Distributed-Memory Machines: Two-Level Mappings.
Proceedings of the Languages and Compilers for Parallel Computing, 1995
Proceedings of IPPS '95, 1995
1994
Parallel Dynamic Simulation of Multiple Manipulator Systems: Temporal Versus Spatial Methods.
IEEE Trans. Syst. Man Cybern. Syst., 1994
Efficient Dynamic Simulation of Multiple Manipulator Systems with Singular Configurations.
IEEE Trans. Syst. Man Cybern. Syst., 1994
Implementing Fast Fourier Transforms on Distributed-Memory Multiprocessors Using Data Redistributions.
Parallel Process. Lett., 1994
EXTENT: a portable programming environment for designing and implementing high-performance block recursive algorithms.
Proceedings of the Proceedings Supercomputing '94, 1994
Incremental Generation of Index Sets for Array Statement Execution on Distributed-Memory Machines.
Proceedings of the Languages and Compilers for Parallel Computing, 1994
A Clustered Reduced Communication Element by Element Preconditioned Conjugate Gradient Algorithm for Finite Element Computations.
Proceedings of the 8th International Symposium on Parallel Processing, 1994
Proceedings of the 8th international conference on Supercomputing, 1994
Proceedings of the 8th international conference on Supercomputing, 1994
Communication-Efficient Implementation of Block Recursive Algorithms on Distributed-Memory Machines.
Proceedings of the Proceedings 1994 International Conference on Parallel and Distributed Systems, 1994
1993
J. Parallel Distributed Comput., 1993
Proceedings of the Proceedings Supercomputing '93, 1993
A Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas.
Proceedings of the Languages and Compilers for Parallel Computing, 1993
A Parallel Progressive Refinement Image Rendering Algorithm on a Scalable Multithreaded VLSI Processor Array.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
On Compiling Array Expressions for Efficient Execution on Distributed-Memory Machines.
Proceedings of the 1993 International Conference on Parallel Processing, 1993
Proceedings of the 1993 International Conference on Parallel Processing, 1993
Proceedings of the 1993 International Conference on Parallel Processing, 1993
Architectural Synthesis of Performance-Driven Multipliers with Accumulator Interleaving.
Proceedings of the 30th Design Automation Conference. Dallas, 1993
1992
Toward super-real-time simulation of robotic mechanisms using a parallel integration method.
IEEE Trans. Syst. Man Cybern., 1992
J. Parallel Distributed Comput., 1992
Proceedings of the Fourth IEEE Symposium on Parallel and Distributed Processing, 1992
On Data Dependence Analysis for Compiling Programs on Distributed-Memory Machines (Extended Abstract).
Proceedings of the 2nd SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors, Boulder, Colorado, September 30, 1992
Proceedings of the 2nd SIGPLAN Workshop on Languages, Compilers, and Run-Time Environments for Distributed Memory Multiprocessors, Boulder, Colorado, September 30, 1992
Proceedings of the Proceedings Supercomputing '92, 1992
On the Synthesis of Parallel Programs from Tensor Product Formulas for Block Recursive Algorithms.
Proceedings of the Languages and Compilers for Parallel Computing, 1992
Proceedings of the 1992 IEEE International Conference on Robotics and Automation, 1992
1991
IEEE Trans. Parallel Distributed Syst., 1991
IEEE Trans. Parallel Distributed Syst., 1991
Proceedings of the Proceedings Supercomputing '91, 1991
Proceedings of the Third ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), 1991
Proceedings of the 1991 IEEE International Conference on Robotics and Automation, 1991
Computer Graphics Rendering on a Shared Memory Multiprocessor.
Proceedings of the International Conference on Parallel Processing, 1991
Multifrontal Factorization of Sparse Matrices on Shared-Memory Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1991
1990
Parallel Comput., 1990
J. Parallel Distributed Comput., 1990
Proceedings of the Parallel Architectures (Postconference PARBASE-90)., 1990
Tiling of Iteration Spaces for Multicomputers.
Proceedings of the 1990 International Conference on Parallel Processing, 1990
Proceedings of the ACM 18th Annual Computer Science Conference on Cooperation, 1990
1989
IEEE Trans. Robotics Autom., 1989
Efficient sparse matrix factorization for circuit simulation on vector supercomputers.
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 1989
Communication reduction for distributed sparse matrix factorization on a processor mesh.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors.
Proceedings of the Proceedings Supercomputing '89, Reno, NV, USA, November 12-17, 1989, 1989
Proceedings of the 3rd international conference on Supercomputing, 1989
Optimal Static Scheduling of Sequential Loops on Multiprocessors.
Proceedings of the International Conference on Parallel Processing, 1989
1988
Iterative Algorithms for Solution of Large Sparse Systems of Linear Equations on Hypercubes.
IEEE Trans. Computers, 1988
Parallelization and performance evaluation of circuit simulation on a shared-memory multiprocessor.
Proceedings of the 2nd international conference on Supercomputing, 1988
Proceedings of the 2nd international conference on Supercomputing, 1988
Proceedings of the 1988 IEEE International Conference on Robotics and Automation, 1988
Proceedings of International Conference on Neural Networks (ICNN'88), 1988
Proceedings of International Conference on Neural Networks (ICNN'88), 1988
Comparative analysis of approaches to hardware acceleration for sparse-matrix factorization.
Proceedings of the Computer Design: VLSI in Computers and Processors, 1988
1987
IEEE Trans. Computers, 1987
Proceedings of the Supercomputing, 1987
Mapping Finite Element Graphs onto Processor Meshes.
Proceedings of the International Conference on Parallel Processing, 1987
1985
Proceedings of the 22nd ACM/IEEE conference on Design automation, 1985