Jeffrey S. Vetter

Orcid: 0000-0002-2449-6720

According to our database1, Jeffrey S. Vetter authored at least 280 papers between 1994 and 2024.

Collaborative distances:

Awards

IEEE Fellow

IEEE Fellow 2017, "For contributions to high performance computing".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Large language model evaluation for high-performance computing software development.
Concurr. Comput. Pract. Exp., November, 2024

MatRIS: Addressing the Challenges for Portability and Heterogeneity Using Tasking for Matrix Decomposition (Cholesky).
Proceedings of the Asynchronous Many-Task Systems and Applications, 2024

IRIS Reimagined: Advancements in Intelligent Runtime System for Task-Based Programming.
Proceedings of the Asynchronous Many-Task Systems and Applications, 2024

eCC++ : A Compiler Construction Framework for Embedded Domain-Specific Languages.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

IRIS: Exploring Performance Scaling of the Intelligent Runtime System and its Dynamic Scheduling Policies.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2024

CHARM-SYCL & IRIS: A Tool Chain for Performance Portability on Extremely Heterogeneous Systems.
Proceedings of the 20th IEEE International Conference on e-Science, 2024

2023
Abisko: Deep codesign of an architecture for spiking neural networks using novel neuromorphic materials.
Int. J. High Perform. Comput. Appl., July, 2023

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation.
CoRR, 2023

OpenACC Unified Programming Environment for Multi-hybrid Acceleration with GPU and FPGA.
Proceedings of the High Performance Computing, 2023

Experience Migrating OpenCL to SYCL: A Case Study on Searches for Potential Off-Target Sites of Cas9 RNA-Guided Endonucleases on AMD GPUs.
Proceedings of the 36th IEEE International System-on-Chip Conference, 2023

Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Mixed-Precision S/DGEMM Using the TF32 and TF64 Frameworks on Low-Precision AI Tensor Cores.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

FFTX-IRIS: Towards Performance Portability and Heterogeneity for SPIRAL Generated Code.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

MatRIS: Multi-level Math Library Abstraction for Heterogeneity and Performance Portability using IRIS Runtime.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Performance Evaluation of Heterogeneous GPU Programming Frameworks for Hemodynamic Simulations.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Julia as a unifying end-to-end workflow language on the Frontier exascale system.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

CHARM-SYCL: New Unified Programming Environment for Multiple Accelerator Types.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Tiling Framework for Heterogeneous Computing of Matrix based Tiled Algorithms.
Proceedings of the 2nd International Workshop on Extreme Heterogeneity Solutions, 2023

A MultiGPU Performance-Portable Solution for Array Programming Based on Kokkos.
Proceedings of the 9th ACM SIGPLAN International Workshop on Libraries, 2023

A Benchmark Suite for Improving Performance Portability of the SYCL Programming Model.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2023

A 3D Implementation of Convolutional Neural Network for Fast Inference.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2023

Understanding Performance Portability of SYCL Kernels: A Case Study with the All-Pairs Distance Calculation in Bioinformatics on GPUs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Understanding SYCL Portability for Pseudorandom Number Generation: a Case Study with Gene-Expression Connectivity Mapping.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Evaluating performance and portability of high-level programming models: Julia, Python/Numba, and Kokkos on exascale nodes.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Arithmetic Primitives for Efficient Neuromorphic Computing.
Proceedings of the IEEE International Conference on Rebooting Computing, 2023

Experience Deploying Graph Applications on GPUs with SYCL.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation.
Proceedings of the 52nd International Conference on Parallel Processing Workshops, 2023

On-Sensor Data Filtering using Neuromorphic Computing for High Energy Physics Experiments.
Proceedings of the 2023 International Conference on Neuromorphic Systems, 2023

IRIS-DMEM: Efficient Memory Management for Heterogeneous Computing.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

Errant Beam Detection Using the AMD Versal ACAP and Vitis AI.
Proceedings of the IEEE High Performance Extreme Computing Conference, 2023

Understanding Portability of Automotive Workload: A Case Study with the Points-to-Image Kernel in SYCL on Heterogeneous Computing Platforms.
Proceedings of the 15th Workshop on General Purpose Processing Using GPU, 2023

Accelerating Hyperdimensional Classifier with SYCL.
Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022
Preparing for the Future - Rethinking Proxy Applications.
Comput. Sci. Eng., 2022

Bridging HPC Communities through the Julia Programming Language.
CoRR, 2022

Encoding Integers and Rationals on Neuromorphic Computers using Virtual Neuron.
CoRR, 2022

Preparing for the Future - Rethinking Proxy Apps.
CoRR, 2022

Evaluating Nonuniform Reduction in HIP and SYCL on GPUs.
Proceedings of the 8th IEEE/ACM International Workshop on Data Analysis and Reduction for Big Scientific Data, 2022

KokkACC: Enhancing Kokkos with OpenACC.
Proceedings of the 9th Workshop on Accelerator Programming Using Directives, 2022

MAPredict: Static Analysis Driven Memory Access Prediction Framework for Modern CPUs.
Proceedings of the High Performance Computing - 37th International Conference, 2022

Adrastea: An Efficient FPGA Design Environment for Heterogeneous Scientific Computing and Machine Learning.
Proceedings of the Accelerating Science and Engineering Discoveries Through Integrated Research Infrastructure for Experiment, Big Data, Modeling and Simulation, 2022

LaRIS: Targeting Portability and Productivity for LAPACK Codes on Extreme Heterogeneous Systems by Using IRIS.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SparseLU, A Novel Algorithm and Math Library for Sparse LU Factorization.
Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

Design and analysis of CXL performance models for tightly-coupled heterogeneous computing.
Proceedings of the ExHET@PPoPP 2022: Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions, 2022

Leveraging Compiler-Based Translation to Evaluate a Diversity of Exascale Platforms.
Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Evaluating HPC Kernels for Processing in Memory.
Proceedings of the 2022 International Symposium on Memory Systems, 2022

Evaluating Unified Memory Performance in HIP.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Integer Sum Reduction with OpenMP on an AMD MI100 GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Virtual Neuron: A Neuromorphic Approach for Encoding Numbers.
Proceedings of the IEEE International Conference on Rebooting Computing, 2022

A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU.
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Ultra Low Latency Machine Learning for Scientific Edge Applications.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

A Portable and Heterogeneous LU Factorization on IRIS.
Proceedings of the Euro-Par 2022: Parallel Processing Workshops, 2022

High Performance Adaptive Physics Refinement to Enable Large-Scale Tracking of Cancer Cell Trajectory.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU.
Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2022

Performance portability study of epistasis detection using SYCL on NVIDIA GPU.
Proceedings of the BCB '22: 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook, Illinois, USA, August 7, 2022

2021
Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm.
IEEE Trans. Parallel Distributed Syst., 2021

Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs.
Parallel Comput., 2021

End-to-end online performance data capture and analysis for scientific workflows.
Future Gener. Comput. Syst., 2021

A Hierarchical Task Scheduler for Heterogeneous Computing.
Proceedings of the High Performance Computing - 36th International Conference, 2021

Comparing LLC-Memory Traffic between CPU and GPU Architectures.
Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2021

PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips.
Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem.
Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAs.
Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

A Memory Efficient Lock-Free Circular Queue.
Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Evaluating the Performance of Integer Sum Reduction on an Intel GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Evaluating CUDA Portability with HIPCL and DPCT.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs.
Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Toward Performance Portable Programming for Heterogeneous Systems on a Chip: A Case Study with Qualcomm Snapdragon SoC.
Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Static Graphs for Coding Productivity in OpenACC.
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems.
Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

2020
Understanding the Impact of Memory Access Patterns in Intel Processors.
Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020

CCAMP: an integrated translation and optimization framework for OpenACC and OpenMP.
Proceedings of the International Conference for High Performance Computing, 2020

Evaluating the Performance and Portability of Contemporary SYCL Implementations.
Proceedings of the IEEE/ACM International Workshop on Performance, 2020

OpenACC Profiling Support for Clang and LLVM using Clacc and TAU.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems.
Proceedings of the GPGPU@PPoPP '20: 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit colocated with 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

In-Depth Optimization with the OpenACC-to-FPGA Framework on an Arria 10 FPGA.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Productive Hardware Designs using Hybrid HLS-RTL Development.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Cash: A Single-Source Hardware-Software Codesign Framework for Rapid Prototyping.
Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Deffe: a data-efficient framework for performance characterization in domain-specific computing.
Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

MEPHESTO: Modeling Energy-Performance in Heterogeneous SoCs and Their Trade-Offs.
Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019
Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures.
J. Parallel Distributed Comput., 2019

Implementing efficient data compression and encryption in a persistent key-value store for HPC.
Int. J. High Perform. Comput. Appl., 2019

Enhancing Monte Carlo proxy applications on GPUs.
Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training.
Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

TensorFlow Doing HPC.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

CCAMP: OpenMP and OpenACC Interoperable Framework.
Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

Analyzing the suitability of contemporary 3D-stacked PIM architectures for HPC scientific applications.
Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

POSTER: Tango: An Optimizing Compiler for Just-In-Time RTL Simulation.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
Characterizing the performance benefit of hybrid memory system for HPC applications.
Parallel Comput., 2018

Aspen-based performance and energy modeling frameworks.
J. Parallel Distributed Comput., 2018

The future of scientific workflows.
Int. J. High Perform. Comput. Appl., 2018

Siena: exploring the design space of heterogeneous memory systems.
Proceedings of the International Conference for High Performance Computing, 2018

DRAGON: breaking GPU memory capacity limits with direct NVM access.
Proceedings of the International Conference for High Performance Computing, 2018

Juggler: a dependence-aware task-based execution framework for GPUs.
Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Prometheus: Coherent Exploration of Hardware and Software Optimizations Using Aspen.
Proceedings of the 26th IEEE International Symposium on Modeling, 2018

NVIDIA Tensor Core Programmability, Performance & Precision.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Designing Algorithms for the EMU Migrating-threads-based Architecture.
Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Tuyere: enabling scalable memory workloads for system exploration.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

2017
Architectures for the Post-Moore Era.
IEEE Micro, 2017

PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows.
Int. J. High Perform. Comput. Appl., 2017

Performance Portability in Extreme Scale Computing (Dagstuhl Seminar 17431).
Dagstuhl Reports, 2017

Addressing Read-Disturbance Issue in STT-RAM by Data Compression and Selective Duplication.
IEEE Comput. Archit. Lett., 2017

Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File.
Proceedings of the 30th International Conference on VLSI Design and 16th International Conference on Embedded Systems, 2017

PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures.
Proceedings of the International Conference for High Performance Computing, 2017

Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation.
Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2017

Architecting SOT-RAM Based GPU Register File.
Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

Design and Implementation of Papyrus: Parallel Aggregate Persistent Storage.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Language-Based Optimizations for Persistence on Nonvolatile Main Memory Systems.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Distributed workflows for modeling experimental data.
Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016
EqualWrites: Reducing Intra-Set Write Variations for Enhancing Lifetime of Non-Volatile Caches.
IEEE Trans. Very Large Scale Integr. Syst., 2016

A Survey Of Techniques for Architecting DRAM Caches.
IEEE Trans. Parallel Distributed Syst., 2016

A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems.
IEEE Trans. Parallel Distributed Syst., 2016

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems.
IEEE Trans. Parallel Distributed Syst., 2016

A Survey of Techniques for Modeling and Improving Reliability of Computing Systems.
IEEE Trans. Parallel Distributed Syst., 2016

Reliability Tradeoffs in Design of Volatile and Nonvolatile Caches.
J. Circuits Syst. Comput., 2016

A Study of Power-Performance Modeling Using a Domain-Specific Language.
Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging.
Proceedings of the Second International Symposium on Memory Systems, 2016

Toward an End-to-End Framework for Modeling, Monitoring and Anomaly Detection for Scientific Workflows.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Preparing for Supercomputing's Sixth Wave.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

NVL-C: Static Analysis Techniques for Efficient, Correct Programming of Non-Volatile Main Memory Systems.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Reducing Soft-error Vulnerability of Caches using Data Compression.
Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016

2015
A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches.
IEEE Trans. Parallel Distributed Syst., 2015

Automated Design Space Exploration with Aspen.
Sci. Program., 2015

Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures.
IEEE Micro, 2015

A Survey of CPU-GPU Heterogeneous Computing Techniques.
ACM Comput. Surv., 2015

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing.
Comput. Sci. Eng., 2015

AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches.
IEEE Comput. Archit. Lett., 2015

Examining recent many-core architectures and programming models using SHOC.
Proceedings of the 6th International Workshop on Performance Modeling, 2015

FITL: extending LLVM for the translation of fault-injection directives.
Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

An OpenACC-based unified programming model for multi-accelerator systems.
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

AYUSH: Extending Lifetime of SRAM-NVM Way-Based Hybrid Caches Using Wear-Leveling.
Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

COMPASS: A Framework for Automated Performance Modeling and Prediction.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Automated Characterization of Parallel Application Communication Patterns.
Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches.
Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014
A Survey of Methods for Analyzing and Improving GPU Energy Efficiency.
ACM Comput. Surv., 2014

neCODEC: nearline data compression for scientific applications.
Clust. Comput., 2014

BlackjackBench: Portable Hardware Characterization with Automated Results' Analysis.
Comput. J., 2014

Advanced Application Support for Improved GPU Utilization on Keeneland.
Proceedings of the Annual Conference of the Extreme Science and Engineering Discovery Environment, 2014

Quantitatively Modeling Application Resilience with the Data Vulnerability Factor.
Proceedings of the International Conference for High Performance Computing, 2014

Application characterization using Oxbow toolkit and PADS infrastructure.
Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

OpenARC: extensible OpenACC compiler framework for directive-based accelerator programming study.
Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches.
Proceedings of the 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, 2014

Evaluating Performance Portability of OpenACC.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches.
Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

AsHES Keynote.
Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Improving energy efficiency of embedded DRAM caches for high-end computing systems.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing.
Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

WriteSmoothing: improving lifetime of non-volatile caches using intra-set wear-leveling.
Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.
Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013
Special Issue: Selected Papers from Super Computing 2012.
Sci. Program., 2013

Modeling synthetic aperture radar computation with Aspen.
Int. J. High Perform. Comput. Appl., 2013

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Characterizing the Impact of Prefetching on Scientific Application Performance.
Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach.
Proceedings of the International Conference for High Performance Computing, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.
Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Diagnosis and optimization of application prefetching performance.
Proceedings of the International Conference on Supercomputing, 2013

FlexiWay: A cache energy saving technique using fine-grained cache reconfiguration.
Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection.
Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.
Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Contemporary High Performance Computing - From Petascale toward Exascale.
Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-4665-6834-1, 2013

2012
BlackjackBench: portable hardware characterization.
SIGMETRICS Perform. Evaluation Rev., 2012

HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems.
J. Parallel Distributed Comput., 2012

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing.
Int. J. Parallel Program., 2012

RXIO: Design and implementation of high performance RDMA-capable GridFTP.
Comput. Electr. Eng., 2012

Aspen: a domain specific language for performance modeling.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Early evaluation of directive-based GPU programming models for productive exascale computing.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

PCM-Based Durable Write Cache for Fast Disk I/O.
Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Efficient Quality Threshold Clustering for Parallel Architectures.
Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures.
Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011
Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures.
IEEE Micro, 2011

The International Exascale Software Project roadmap.
Int. J. High Perform. Comput. Appl., 2011

Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community.
Comput. Sci. Eng., 2011

MTAAP Introduction.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5.
Proceedings of the International Conference on Parallel Processing, 2011

Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort.
Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Quantifying NUMA and contention effects in multi-GPU systems.
Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010
On the Path to Exascale.
Int. J. Distributed Syst. Technol., 2010

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems.
Comput. Sci. Res. Dev., 2010

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures.
Proceedings of the Conference on High Performance Computing Networking, 2010

Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Initial characterization of parallel NFS implementations.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

MTAAP 2010 Welcome.
Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Efficient Zero-Copy Noncontiguous I/O for Globus on InfiniBand.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Maestro: Data Orchestration and Tuning for OpenCL Devices.
Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Enabling a highly-scalable global address space model for petascale computing.
Proceedings of the 7th Conference on Computing Frontiers, 2010

Toward exascale computational science with heterogeneous processing.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

The Scalable Heterogeneous Computing (SHOC) benchmark suite.
Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study.
Parallel Comput., 2009

Revolutionary technologies for acceleration of emerging petascale applications.
Parallel Comput., 2009

Design, implementation, and evaluation of transparent pNFS on Lustre.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A Holistic Approach for Performance Measurement and Analysis for Petascale Applications.
Proceedings of the Computational Science, 2009

Accelerating S3D: A GPGPU Case Study.
Proceedings of the Euro-Par 2009, 2009

2008
Performance characteristics of biomolecular simulations on high-end systems with multi-core processors.
Parallel Comput., 2008

An Evaluation of the Oak Ridge National Laboratory Cray XT3.
Int. J. High Perform. Comput. Appl., 2008

DARPA's HPCS Program- History, Models, Tools, Languages.
Adv. Comput., 2008

Wide-area performance profiling of 10GigE and InfiniBand technologies.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Early evaluation of IBM BlueGene/P.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

A Methodology for Developing High Fidelity Communication Models for Large-Scale Applications Targeted on Multicore Systems.
Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

Experimental Analysis of InfiniBand Transport Services on WAN.
Proceedings of The 2008 IEEE International Conference on Networking, 2008

Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors.
Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Performance characterization and optimization of parallel I/O on the Cray XT.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Impact of multicores on large-scale molecular dynamics simulations.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

ParColl: Partitioned Collective I/O on the Cray XT.
Proceedings of the 2008 International Conference on Parallel Processing, 2008

HPC Interconnection Networks: The Key to Exascale Computing.
Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

Empirical Analysis of a Large-Scale Hierarchical Storage System.
Proceedings of the Euro-Par 2008, 2008

Xen-Based HPC: A Parallel I/O Perspective.
Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007
FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications.
Simul., 2007

Throughput Improvement of Molecular Dynamics Simulations Using Reconfigurable Computing.
Scalable Comput. Pract. Exp., 2007

A framework for performance analysis of Co-Array Fortran.
Concurr. Comput. Pract. Exp., 2007

Using FPGA Devices to Accelerate Biomolecular Simulations.
Computer, 2007

Performance evaluation of the cray XT3 configured with dual core opteron processors.
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Analysis of a Computational Biology Simulation Technique on Emerging Processing Architectures.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices.
Proceedings of the High Performance Computing and Communications, 2007

Virtual Cluster Management with Xen.
Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007

Balancing productivity and performance on the cell broadband engine.
Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Exploiting Lustre File Joining for Effective Collective IO.
Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Sensitivity Analysis of Biomolecular Simulations using Symbolic Models.
Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 2007

An Application Specific Memory Characterization Technique for Co-processor Accelerators.
Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

2006
Kernel-level single system image for petascale computing.
ACM SIGOPS Oper. Syst. Rev., 2006

Performance evaluation of high-speed interconnects using dense communication patterns.
Parallel Comput., 2006

Performance characterization of molecular dynamics techniques for biomolecular simulations.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Early evaluation of the Cray XT3.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A framework to develop symbolic performance models of parallel applications.
Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Characterization of Scientific Workloads on Systems with Multi-Core Processors.
Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

An Analysis of System Balance Requirements for Scientific Applications.
Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Topic 2: Performance Prediction and Evaluation.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Hierarchical Model Validation of Symbolic Performance Models of Scientific Kernels.
Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

IPMI-based Efficient Notification Framework for Large Scale Cluster Computing.
Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005
Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture.
IEEE Micro, 2005

Evaluating high-performance computers.
Concurr. Pract. Exp., 2005

Capturing Petascale Application Characteristics with the Sequoia Toolkit.
Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis.
Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Performance Evaluation of the SGI Altix 3700.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization.
Proceedings of the Computational Science, 2005

A Performance Measurement Infrastructure for Co-array Fortran.
Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Balancing FPGA Resource Utilities.
Proceedings of The 2005 International Conference on Engineering of Reconfigurable Systems and Algorithms, 2005

2004
Performance evaluation of the Cray X1 distributed shared memory architecture.
Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

Topic 2: Performance Evaluation.
Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003
Communication characteristics of large-scale scientific applications for contemporary cluster architectures.
J. Parallel Distributed Comput., 2003

2002
Dynamic statistical profiling of communication activity in distributed applications.
Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2002

An empirical performance evaluation of scalable scientific applications.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Asserting performance expectations.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Scalable analysis techniques for microprocessor performance counter metrics.
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

An overview of the BlueGene/L Supercomputer.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Local Discovery of System Architecture - Application Parameter Sensitivity: An Empirical Technique for Adaptive Grid Applications.
Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

2001
An Integrated Performance Visualizer for MPI/OpenMP Programs.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications.
Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Statistical scalability analysis of communication operations in distributed applications.
Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'01), 2001

2000
Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids.
Int. J. High Perform. Comput. Appl., 2000

Dynamic Software Testing of MPI Applications with Umpire.
Proceedings of the Proceedings Supercomputing 2000, 2000

Performance analysis of distributed applications using automatic classification of communication inefficiencies.
Proceedings of the 14th international conference on Supercomputing, 2000

Performance Issues in Parallel Processing Systems.
Proceedings of the Performance Evaluation: Origins and Directions, 2000

1999
Techniques for high-performance computational steering.
IEEE Concurr., 1999

Managing Performance Analysis with Dynamic Statistical Projection Pursuit.
Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Experiences with Computational Steering on Existing Scientific Applications.
Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Optimizations for Language-Directed Computational Steering.
Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1998
From interactive applications to distributed laboratories.
IEEE Concurr., 1998

Falcon: On-line monitoring for steering parallel programs.
Concurr. Pract. Exp., 1998

Techniques for Delayed Binding of Monitoring Mechanisms to Application-Specific Instrumentation Points.
Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Autopilot: Adaptive Control of Distributed Applications.
Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998

Computational Steering.
Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998

1997
Computational Steering Annoted Bibliography.
ACM SIGPLAN Notices, 1997

High Performance Computational Steering of Physical Simulations.
Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996
Models for computational steering.
Proceedings of the Third International Conference on Configurable Distributed Systems, 1996

1995
Progress: A Toolkit for Interactive Program Steering.
Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994
An annotated bibliography of interactive program steering.
ACM SIGPLAN Notices, 1994


  Loading...