Sameer Shende

Orcid: 0000-0002-2592-669X

According to our database1, Sameer Shende authored at least 97 papers between 1996 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Providing a Flexible and Comprehensive Software Stack Via Spack, an Extreme-Scale Scientific Software Stack, and Software Development Kits.
Comput. Sci. Eng., 2024

Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and Optimization.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
Towards Smarter Schedulers: Molding Jobs into the Right Shape via Monitoring and Modeling.
Proceedings of the High Performance Computing, 2023

Generating and Scaling a Multi-Language Test-Suite for MPI.
Proceedings of the 30th European MPI Users' Group Meeting, 2023

2022
Translating High-Performance Computing Tools From Research to Practice: Experiences With the TAU Performance System.
Comput. Sci. Eng., 2022

Enabling Global MPI Process Addressing in MPI Applications.
Proceedings of the EuroMPI/USA'22: 29th European MPI Users' Group Meeting, Chattanooga, TN, USA, September 26, 2022

TAU Performance System.
Proceedings of the IWOCL'22: International Workshop on OpenCL, Bristol, United Kingdom, May 10, 2022

2020
OpenACC Profiling Support for Clang and LLVM using Clacc and TAU.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

Multi-Platform SYCL Profiling with TAU.
Proceedings of the IWOCL '20: International Workshop on OpenCL, 2020

Identifying Optimization Opportunities Using Memory Access Tracing in OpenSHMEM Runtimes with the TAU Performance System.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

2019
Checkpoint/restart approaches for a thread-based MPI runtime.
Parallel Comput., 2019

Multi-Level Performance Instrumentation for Kokkos Applications Using TAU.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Mixing ranks, tasks, progress and nonblocking collectives.
Proceedings of the 26th European MPI Users' Group Meeting, 2019

Towards Runtime Analytics in a Parallel Performance System.
Proceedings of the 17th International Conference on High Performance Computing & Simulation, 2019

A Plugin Architecture for the TAU Performance System.
Proceedings of the 48th International Conference on Parallel Processing, 2019

2018
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU.
Parallel Comput., 2018

Transparent High-Speed Network Checkpoint/Restart in MPI.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Tracking Memory Usage in OpenSHMEM Runtimes with the TAU Performance System.
Proceedings of the OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, 2018

Performance Visualization for TAU Instrumented Scientific Workflows.
Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2018), 2018

2017
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU.
Proceedings of the 24th European MPI Users' Group Meeting, 2017

Performance Analysis of OpenSHMEM Applications with TAU Commander.
Proceedings of the OpenSHMEM and Related Technologies. Big Compute and Big Data Convergence, 2017

Towards a Better Expressiveness of the Speedup Metric in MPI Context.
Proceedings of the 46th International Conference on Parallel Processing Workshops, 2017

2016
Introducing Task-Containers as an Alternative to Runtime-Stacking.
Proceedings of the 23rd European MPI Users' Group Meeting, EuroMPI 2016, 2016

Profiling Production OpenSHMEM Applications.
Proceedings of the OpenSHMEM and Related Technologies. Enhancing OpenSHMEM for Hybrid Environments, 2016

2015
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study.
Sci. Program., 2015

Design and performance characterization of electronic structure calculations on massively parallel supercomputers: a case study of GPAW on the Blue Gene/P architecture.
Concurr. Comput. Pract. Exp., 2015

An MPI Halo-Cell Implementation for Zero-Copy Abstraction.
Proceedings of the 22nd European MPI Users' Group Meeting, 2015

2014
Profiling Non-numeric OpenSHMEM Applications with the TAU Performance System.
Proceedings of the OpenSHMEM and Related Technologies. Experiences, Implementations, and Tools, 2014

Integrated Measurement for Cross-Platform OpenMP Performance Analysis.
Proceedings of the Using and Improving OpenMP for Devices, Tasks, and More, 2014

2013
Test-driven coarray parallelization of a legacy Fortran application.
Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering, 2013

An early prototype of an autonomic performance environment for exascale.
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers, 2013

Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions.
Proceedings of the 42nd International Conference on Parallel Processing, 2013

MIL: A language to build program analysis tools through static binary instrumentation.
Proceedings of the 20th Annual International Conference on High Performance Computing, 2013

2012
Performance characterization of global address space applications: a case study with NWChem.
Concurr. Comput. Pract. Exp., 2012

Hands-on Practical Hybrid Parallel Application Performance Engineering.
Proceedings of the Recent Advances in the Message Passing Interface, 2012

2011
Proceedings of the Encyclopedia of Parallel Computing, 2011

GPAW - massively parallel electronic structure calculations with Python-based software.
Proceedings of the International Conference on Computational Science, 2011

Poster: performance improvements of front tracking package.
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2011

Advances in the TAU Performance System.
Proceedings of the Tools for High Performance Computing 2011, 2011

Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir.
Proceedings of the Tools for High Performance Computing 2011, 2011

Characterizing I/O Performance Using the TAU Performance System.
Proceedings of the Applications, Tools and Techniques on the Road to Exascale Computing, Proceedings of the conference ParCo 2011, 31 August, 2011

Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs.
Proceedings of the International Conference on Parallel Processing, 2011

An Approach to Creating Performance Visualizations in a Parallel Profile Analysis Tool.
Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

2010
Improving the Scalability of Performance Evaluation Tools.
Proceedings of the Applied Parallel and Scientific Computing, 2010

Design and Implementation of a Hybrid Parallel Performance Measurement System.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU.
Proceedings of the 12th IEEE International Conference on High Performance Computing and Communications, 2010

Score-P: A Unified Performance Measurement System for Petascale Applications.
Proceedings of the Competence in High Performance Computing 2010, 2010

2009
Performance Tool Integration in a GPU Programming Environment: Experiences with TAU and HMPP.
Proceedings of the Parallel Computing: From Multicores and GPU's to Petascale, 2009

A Generic and Configurable Source-Code Instrumentation Component.
Proceedings of the Computational Science, 2009

2008
Knowledge support and automation for performance analysis with PerfExplorer 2.0.
Sci. Program., 2008

Integrated parallel performance views.
Clust. Comput., 2008

Evolution of a Parallel Performance System.
Proceedings of the Tools for High Performance Computing, 2008

Performance Tool Workflows.
Proceedings of the Computational Science, 2008

Observing Performance Dynamics Using Parallel Profile Snapshots.
Proceedings of the Euro-Par 2008, 2008

Parametric Studies in Eclipse with TAU and PerfExplorer.
Proceedings of the Euro-Par 2008 Workshops, 2008

2007
Supporting Nested OpenMP Parallelism in the TAU Performance System.
Int. J. Parallel Program., 2007

Compensation of Measurement Overhead in Parallel Performance Profiling.
Int. J. High Perform. Comput. Appl., 2007

Performance modeling of component assemblies.
Concurr. Comput. Pract. Exp., 2007

Scalable, Automated Performance Analysis with TAU and PerfExplorer.
Proceedings of the Parallel Computing: Architectures, 2007

<i>TAUoverSupermon</i> : Low-Overhead Online Parallel Performance Monitoring.
Proceedings of the Euro-Par 2007, 2007

2006
The Tau Parallel Performance System.
Int. J. High Perform. Comput. Appl., 2006

A Component Architecture for High-Performance Scientific Computing.
Int. J. High Perform. Comput. Appl., 2006

Bridging the language gap in scientific computing: the Chasm approach.
Concurr. Comput. Pract. Exp., 2006

TAUg: Runtime Global Performance Data Access Using MPI.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006

Optimization of Instrumentation in Parallel Performance Evaluation Tools.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Workload Characterization Using the TAU Performance System.
Proceedings of the Applied Parallel Computing. State of the Art in Scientific Computing, 2006

Integrating TAU with Eclipse: A Performance Analysis System in an Integrated Development Environment.
Proceedings of the High Performance Computing and Communications, 2006

Early Experiences with KTAU on the IBM BG/L.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Kernel-Level Measurement for Integrated Parallel Performance Views: the KTAU Project.
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006

2005
Overhead Compensation in Performance Profiling.
Parallel Process. Lett., 2005

Performance technology for parallel and distributed component software.
Concurr. Pract. Exp., 2005

Performance Profiling Overhead Compensation for MPI Programs.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

A Scalable Approach to MPI Application Performance Analysis.
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2005

Phase-Based Parallel Performance Profiling.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Trace-Based Parallel Performance Overhead Compensation.
Proceedings of the High Performance Computing and Communications, 2005

Models for On-the-Fly Compensation of Measurement Overhead in Parallel Performance Profiling.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

2004
Performance Measurement and Modeling of Component Applications in a High Performance Computing Environment: A Case Study.
Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS 2004), 2004

Computational Quality of Service for Scientific Components.
Proceedings of the Component-Based Software Engineering, 7th International Symposium, 2004

2003
Performance Analysis Integration in the Uintah Software Development Cycle.
Int. J. Parallel Program., 2003

Integration and application of TAU in parallel Java environments.
Concurr. Comput. Pract. Exp., 2003

Online Performance Observation of Large-Scale Parallel Applications.
Proceedings of the Parallel Computing: Software Technology, 2003

Online Remote Trace Analysis of Parallel Applications on High-Performance Clusters.
Proceedings of the High Performance Computing, 5th International Symposium, 2003

A Performance Interface for Component-Based Applications.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Performance Instrumentation and Measurement for Terascale Systems.
Proceedings of the Computational Science - ICCS 2003, 2003

ParaProf: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis.
Proceedings of the Euro-Par 2003. Parallel Processing, 2003

2002
Design and Prototype of a Performance Tool Interface for OpenMP.
J. Supercomput., 2002

Integrating Performance Analysis in the Uintah Software Development Cycle.
Proceedings of the High Performance Computing, 4th International Symposium, 2002

2001
Performance Technology for Complex Parallel and Distributed Systems.
Parallel Distributed Comput. Pract., 2001

On using SCALEA for performance analysis of distributed and parallel programs.
Proceedings of the 2001 ACM/IEEE conference on Supercomputing, 2001

Integration and applications of the TAU performance system in parallel Java environments.
Proceedings of the ACM 2001 Java Grande Conference, Stanford University, California, USA, 2001

2000
A Tool Framework for Static and Dynamic Analysis of Object-Oriented Software with Templates.
Proceedings of the Proceedings Supercomputing 2000, 2000

1999
A Runtime Monitoring Framework for the TAU Profiling System.
Proceedings of the Computing in Object-Oriented Parallel Environments, 1999

SMARTS: exploiting temporal locality and parallelism through vertical execution.
Proceedings of the 13th international conference on Supercomputing, 1999

1998
Portable profiling and tracing for parallel, scientific applications using C++.
Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, 1998

An IL converter and program database for analysis tools.
Proceedings of the SIGMETRICS Symposium on Parallel and Distributed Tools, 1998

Dynamic Performance Callstack Sampling: Merging TAU and DAQV.
Proceedings of the Applied Parallel Computing, 1998

1996
Event and state-based debugging in TAU: a prototype.
Proceedings of the SIGMETRICS symposium on Parallel and distributed tools, 1996


  Loading...