Jeffrey S. Vetter

Frank Y. Liu

Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2022

SparseLU, A Novel Algorithm and Math Library for Sparse LU Factorization.

[BibT_eX]

[DOI]

Cameron Greenwalt

Proceedings of the 12th IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms, 2022

Design and analysis of CXL performance models for tightly-coupled heterogeneous computing.

[BibT_eX]

[DOI]

Anthony M. Cabrera

Aaron R. Young

Proceedings of the ExHET@PPoPP 2022: Proceedings of the 1st International Workshop on Extreme Heterogeneity Solutions, 2022

Leveraging Compiler-Based Translation to Evaluate a Diversity of Exascale Platforms.

[BibT_eX]

[DOI]

Jacob Lambert

Allen D. Malony

Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Evaluating HPC Kernels for Processing in Memory.

[BibT_eX]

[DOI]

Kazi Asifuzzaman

Proceedings of the 2022 International Symposium on Memory Systems, 2022

Evaluating Unified Memory Performance in HIP.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Integer Sum Reduction with OpenMP on an AMD MI100 GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Virtual Neuron: A Neuromorphic Approach for Encoding Numbers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Rebooting Computing, 2022

A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU.

[BibT_eX]

[DOI]

Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022

IRIS-BLAS: Towards a Performance Portable and Heterogeneous BLAS Library.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Ultra Low Latency Machine Learning for Scientific Edge Applications.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

A Portable and Heterogeneous LU Factorization on IRIS.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2022: Parallel Processing Workshops, 2022

High Performance Adaptive Physics Refinement to Enable Large-Scale Tracking of Cancer Cell Trajectory.

[BibT_eX]

[DOI]

Simbarashe Chidyagwai

Proceedings of the IEEE International Conference on Cluster Computing, 2022

Understanding Performance Portability of Bioinformatics Applications in SYCL on an NVIDIA GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Bioinformatics and Biomedicine, 2022

Performance portability study of epistasis detection using SYCL on NVIDIA GPU.

[BibT_eX]

[DOI]

Proceedings of the BCB '22: 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Northbrook, Illinois, USA, August 7, 2022

2021

Analysis of GPU Data Access Patterns on Complex Geometries for the D3Q19 Lattice Boltzmann Algorithm.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Optimization with the OpenACC-to-FPGA framework on the Arria 10 and Stratix 10 FPGAs.

[BibT_eX]

[DOI]

Parallel Comput., 2021

End-to-end online performance data capture and analysis for scientific workflows.

[BibT_eX]

[DOI]

George Papadimitriou

Cong Wang

Karan Vahi

Future Gener. Comput. Syst., 2021

A Hierarchical Task Scheduler for Heterogeneous Computing.

[BibT_eX]

[DOI]

Aaron R. Young

Dwaipayan Chakraborty

Proceedings of the High Performance Computing - 36th International Conference, 2021

Comparing LLC-Memory Traffic between CPU and GPU Architectures.

[BibT_eX]

[DOI]

Allen D. Malony

Proceedings of the IEEE/ACM Redefining Scalability for Diversely Heterogeneous Architectures Workshop, 2021

PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips.

[BibT_eX]

[DOI]

Yuanchao Xu

Mehmet Esat Belviranli

Xipeng Shen

Proceedings of the MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, 2021

Flacc: Towards OpenACC support for Fortran in the LLVM Ecosystem.

[BibT_eX]

[DOI]

Valentin Clement

Proceedings of the 7th IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC, 2021

Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAs.

[BibT_eX]

[DOI]

Proceedings of the IWOCL'21: International Workshop on OpenCL, Munich Germany, April, 2021, 2021

A Memory Efficient Lock-Free Circular Queue.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Circuits and Systems, 2021

Evaluating the Performance of Integer Sum Reduction on an Intel GPU.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Evaluating CUDA Portability with HIPCL and DPCT.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

Evaluating the Performance of Integer Sum Reduction in SYCL on GPUs.

[BibT_eX]

[DOI]

Proceedings of the ICPP Workshops 2021: 50th International Conference on Parallel Processing, 2021

IRIS: A Portable Runtime System Exploiting Multiple Heterogeneous Programming Systems.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Toward Performance Portable Programming for Heterogeneous Systems on a Chip: A Case Study with Qualcomm Snapdragon SoC.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

Static Graphs for Coding Productivity in OpenACC.

[BibT_eX]

[DOI]

Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021

OpenMP Target Task: Tasking and Target Offloading on Heterogeneous Systems.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2021: Parallel Processing Workshops, 2021

2020

Understanding the Impact of Memory Access Patterns in Intel Processors.

[BibT_eX]

[DOI]

Allen D. Malony

Proceedings of the IEEE/ACM Workshop on Memory Centric High Performance Computing, 2020

CCAMP: an integrated translation and optimization framework for OpenACC and OpenMP.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Evaluating the Performance and Portability of Contemporary SYCL Implementations.

[BibT_eX]

[DOI]

Beau Johnston

Josh Milthorpe

Proceedings of the IEEE/ACM International Workshop on Performance, 2020

OpenACC Profiling Support for Clang and LLVM using Clacc and TAU.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

The Minos Computing Library: efficient parallel programming for extremely heterogeneous systems.

[BibT_eX]

[DOI]

Proceedings of the GPGPU@PPoPP '20: 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit colocated with 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2020

In-Depth Optimization with the OpenACC-to-FPGA Framework on an Arria 10 FPGA.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium Workshops, 2020

Productive Hardware Designs using Hybrid HLS-RTL Development.

[BibT_eX]

[DOI]

Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Cash: A Single-Source Hardware-Software Codesign Framework for Rapid Prototyping.

[BibT_eX]

[DOI]

Proceedings of the FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020

Deffe: a data-efficient framework for performance characterization in domain-specific computing.

[BibT_eX]

[DOI]

Dwaipayan Chakraborty

Proceedings of the 17th ACM International Conference on Computing Frontiers, 2020

MEPHESTO: Modeling Energy-Performance in Heterogeneous SoCs and Their Trade-Offs.

[BibT_eX]

[DOI]

Proceedings of the PACT '20: International Conference on Parallel Architectures and Compilation Techniques, 2020

2019

Performance portability study for massively parallel computational fluid dynamics application on scalable heterogeneous architectures.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2019

Implementing efficient data compression and encryption in a persistent key-value store for HPC.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Enhancing Monte Carlo proxy applications on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM Performance Modeling, 2019

Runtime Concurrency Control and Operation Scheduling for High Performance Neural Network Training.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium, 2019

TensorFlow Doing HPC.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2019

CCAMP: OpenMP and OpenACC Interoperable Framework.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2019: Parallel Processing Workshops, 2019

Analyzing the suitability of contemporary 3D-stacked PIM architectures for HPC scientific applications.

[BibT_eX]

[DOI]

Proceedings of the 16th ACM International Conference on Computing Frontiers, 2019

POSTER: Tango: An Optimizing Compiler for Just-In-Time RTL Simulation.

[BibT_eX]

[DOI]

Blaise-Pascal Tine

Sudhakar Yalamanchili

Hyesoon Kim

Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018

Characterizing the performance benefit of hybrid memory system for HPC applications.

[BibT_eX]

[DOI]

Parallel Comput., 2018

Aspen-based performance and energy modeling frameworks.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2018

The future of scientific workflows.

[BibT_eX]

[DOI]

Ewa Deelman

Tom Peterka

Ilkay Altintas

Kerstin Kleese van Dam

Int. J. High Perform. Comput. Appl., 2018

Siena: exploring the design space of heterogeneous memory systems.

[BibT_eX]

[DOI]

Ivy Bo Peng

Proceedings of the International Conference for High Performance Computing, 2018

DRAGON: breaking GPU memory capacity limits with direct NVM access.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Juggler: a dependence-aware task-based execution framework for GPUs.

[BibT_eX]

[DOI]

Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2018

Prometheus: Coherent Exploration of Hardware and Software Optimizations Using Aspen.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Symposium on Modeling, 2018

NVIDIA Tensor Core Programmability, Performance & Precision.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops, 2018

Highly Efficient Compensation-Based Parallelism for Wavefront Loops on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

GPU Data Access on Complex Geometries for D3Q19 Lattice Boltzmann Method.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Designing Algorithms for the EMU Migrating-threads-based Architecture.

[BibT_eX]

[DOI]

Mehmet E. Belviranli

Proceedings of the 2018 IEEE High Performance Extreme Computing Conference, 2018

Tuyere: enabling scalable memory workloads for system exploration.

[BibT_eX]

[DOI]

Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

2017

Architectures for the Post-Moore Era.

[BibT_eX]

[DOI]

Erik P. DeBenedictis

Thomas M. Conte

IEEE Micro, 2017

PANORAMA: An approach to performance modeling and diagnosis of extreme-scale workflows.

[BibT_eX]

[DOI]

Ewa Deelman

Int. J. High Perform. Comput. Appl., 2017

Performance Portability in Extreme Scale Computing (Dagstuhl Seminar 17431).

[BibT_eX]

[DOI]

Dagstuhl Reports, 2017

Addressing Read-Disturbance Issue in STT-RAM by Data Compression and Selective Duplication.

[BibT_eX]

[DOI]

Lei Jiang

IEEE Comput. Archit. Lett., 2017

Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File.

[BibT_eX]

[DOI]

Proceedings of the 30th International Conference on VLSI Design and 16th International Conference on Embedded Systems, 2017

PapyrusKV: a high-performance parallel key-value store for distributed NVM architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Durango: Scalable Synthetic Workload Generation for Extreme-Scale Application Performance Modeling and Simulation.

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2017

Architecting SOT-RAM Based GPU Register File.

[BibT_eX]

[DOI]

Mehdi Baradaran Tahoori

Adwait Jog

Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI, 2017

Design and Implementation of Papyrus: Parallel Aggregate Persistent Storage.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Language-Based Optimizations for Persistence on Nonvolatile Main Memory Systems.

[BibT_eX]

[DOI]

Joel Edward Denny

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Distributed workflows for modeling experimental data.

[BibT_eX]

[DOI]

Vickie E. Lynch

Jose Borreguero Calvo

Ewa Deelman

Proceedings of the 2017 IEEE High Performance Extreme Computing Conference, 2017

2016

EqualWrites: Reducing Intra-Set Write Variations for Enhancing Lifetime of Non-Volatile Caches.

[BibT_eX]

[DOI]

IEEE Trans. Very Large Scale Integr. Syst., 2016

A Survey Of Techniques for Architecting DRAM Caches.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

A Survey Of Architectural Approaches for Data Compression in Cache and Main Memory Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

A Survey of Techniques for Modeling and Improving Reliability of Computing Systems.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Reliability Tradeoffs in Design of Volatile and Nonvolatile Caches.

[BibT_eX]

[DOI]

J. Circuits Syst. Comput., 2016

A Study of Power-Performance Modeling Using a Domain-Specific Language.

[BibT_eX]

[DOI]

Proceedings of the 28th International Symposium on Computer Architecture and High Performance Computing, 2016

Improving DRAM Bandwidth Utilization with MLP-Aware OS Paging.

[BibT_eX]

[DOI]

Rishiraj A. Bheda

Thomas M. Conte

Proceedings of the Second International Symposium on Memory Systems, 2016

Toward an End-to-End Framework for Modeling, Monitoring and Anomaly Detection for Scientific Workflows.

[BibT_eX]

[DOI]

Justin M. LaPre

Brian Tierney

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

Algorithm-Directed Data Placement in Explicitly Managed Non-Volatile Memory.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Preparing for Supercomputing's Sixth Wave.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

IMPACC: A Tightly Integrated MPI+OpenACC Framework Exploiting Shared Memory Parallelism.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

NVL-C: Static Analysis Techniques for Efficient, Correct Programming of Non-Volatile Main Memory Systems.

[BibT_eX]

[DOI]

Joel E. Denny

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

Reducing Soft-error Vulnerability of Caches using Data Compression.

[BibT_eX]

[DOI]

Proceedings of the 26th edition on Great Lakes Symposium on VLSI, 2016

2015

A Survey Of Architectural Approaches for Managing Embedded DRAM and Non-Volatile On-Chip Caches.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2015

Automated Design Space Exploration with Aspen.

[BibT_eX]

[DOI]

Kyle L. Spafford

Sci. Program., 2015

Understanding Portability of a High-Level Programming Model on Contemporary Heterogeneous Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2015

A Survey of CPU-GPU Heterogeneous Computing Techniques.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2015

Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing.

[BibT_eX]

[DOI]

Comput. Sci. Eng., 2015

AYUSH: A Technique for Extending Lifetime of SRAM-NVM Hybrid Caches.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2015

Examining recent many-core architectures and programming models using SHOC.

[BibT_eX]

[DOI]

Proceedings of the 6th International Workshop on Performance Modeling, 2015

FITL: extending LLVM for the translation of fault-injection directives.

[BibT_eX]

[DOI]

Joel E. Denny

Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, 2015

An OpenACC-based unified programming model for multi-accelerator systems.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2015

AYUSH: Extending Lifetime of SRAM-NVM Way-Based Hybrid Caches Using Wear-Leveling.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Modeling, 2015

Enabling and Exploiting Flexible Task Assignment on GPU through SM-Centric Program Transformations.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

COMPASS: A Framework for Automated Performance Modeling and Prediction.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

Automated Characterization of Parallel Application Communication Patterns.

[BibT_eX]

[DOI]

Philip C. Roth

Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing, 2015

DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches.

[BibT_eX]

[DOI]

Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, 2015

2014

A Survey of Methods for Analyzing and Improving GPU Energy Efficiency.

[BibT_eX]

[DOI]

ACM Comput. Surv., 2014

neCODEC: nearline data compression for scientific applications.

[BibT_eX]

[DOI]

Clust. Comput., 2014

BlackjackBench: Portable Hardware Characterization with Automated Results' Analysis.

[BibT_eX]

[DOI]

Comput. J., 2014

Advanced Application Support for Improved GPU Utilization on Keeneland.

[BibT_eX]

[DOI]

Proceedings of the Annual Conference of the Extreme Science and Engineering Discovery Environment, 2014

Quantitatively Modeling Application Resilience with the Data Vulnerability Factor.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Application characterization using Oxbow toolkit and PADS infrastructure.

[BibT_eX]

[DOI]

Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, 2014

OpenARC: extensible OpenACC compiler framework for directive-based accelerator programming study.

[BibT_eX]

[DOI]

Proceedings of the First Workshop on Accelerator Programming using Directives, 2014

EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches.

[BibT_eX]

[DOI]

Proceedings of the 2nd Workshop on Interactions of NVM/Flash with Operating Systems and Workloads, 2014

Evaluating Performance Portability of OpenACC.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

LastingNVCache: A Technique for Improving the Lifetime of Non-volatile Caches.

[BibT_eX]

[DOI]

Proceedings of the IEEE Computer Society Annual Symposium on VLSI, 2014

AsHES Keynote.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014

Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Improving energy efficiency of embedded DRAM caches for high-end computing systems.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

OpenARC: open accelerator research compiler for directive-based, efficient heterogeneous computing.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on High-Performance Parallel and Distributed Computing, 2014

WriteSmoothing: improving lifetime of non-volatile caches using intra-set wear-leveling.

[BibT_eX]

[DOI]

Proceedings of the Great Lakes Symposium on VLSI 2014, GLSVLSI '14, Houston, TX, USA - May 21, 2014

SM-centric transformation: circumventing hardware restrictions for flexible GPU scheduling.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Architectures and Compilation, 2014

2013

Special Issue: Selected Papers from Super Computing 2012.

[BibT_eX]

[DOI]

Padma Raghavan

Sci. Program., 2013

Modeling synthetic aperture radar computation with Aspen.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Quantifying Architectural Requirements of Contemporary Extreme-Scale Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Characterizing the Impact of Prefetching on Scientific Application Performance.

[BibT_eX]

[DOI]

Gabriel Marin

Proceedings of the High Performance Computing Systems. Performance Modeling, Benchmarking and Simulation, 2013

Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

A Versatile Performance and Energy Simulation Tool for Composite GPU Global Memory.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE 21st International Symposium on Modelling, 2013

Diagnosis and optimization of application prefetching performance.

[BibT_eX]

[DOI]

Gabriel Marin

Proceedings of the International Conference on Supercomputing, 2013

FlexiWay: A cache energy saving technique using fine-grained cache reconfiguration.

[BibT_eX]

[DOI]

Zhao Zhang

Proceedings of the 2013 IEEE 31st International Conference on Computer Design, 2013

Evaluating the Viability of Application-Driven Cooperative CPU/GPU Fault Detection.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2013: Parallel Processing Workshops, 2013

Exploring hybrid memory for GPU energy efficiency through software-hardware co-design.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013

Contemporary High Performance Computing - From Petascale toward Exascale.

[BibT_eX]

[DOI]

Chapman and Hall / CRC computational science series, CRC Press, ISBN: 978-1-4665-6834-1, 2013

2012

BlackjackBench: portable hardware characterization.

[BibT_eX]

[DOI]

SIGMETRICS Perform. Evaluation Rev., 2012

HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2012

Runtime Techniques to Enable a Highly-Scalable Global Address Space Model for Petascale Computing.

[BibT_eX]

[DOI]

Int. J. Parallel Program., 2012

RXIO: Design and implementation of high performance RDMA-capable GridFTP.

[BibT_eX]

[DOI]

Yuan Tian

Comput. Electr. Eng., 2012

Aspen: a domain specific language for performance modeling.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Early evaluation of directive-based GPU programming models for productive exascale computing.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

PCM-Based Durable Write Cache for Fast Disk I/O.

[BibT_eX]

[DOI]

Proceedings of the 20th IEEE International Symposium on Modeling, 2012

Identifying Opportunities for Byte-Addressable Non-Volatile Memory in Extreme-Scale Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

Efficient Quality Threshold Clustering for Parallel Architectures.

[BibT_eX]

[DOI]

Anthony Danalis

Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium, 2012

GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems.

[BibT_eX]

[DOI]

Vinod Tipparaju

Proceedings of the Computing Frontiers Conference, CF'12, 2012

The tradeoffs of fused memory hierarchies in heterogeneous computing architectures.

[BibT_eX]

[DOI]

Proceedings of the Computing Frontiers Conference, CF'12, 2012

2011

Performance Implications of Nonuniform Device Topologies in Scalable Heterogeneous Architectures.

[BibT_eX]

[DOI]

IEEE Micro, 2011

The International Exascale Software Project roadmap.

[BibT_eX]

[DOI]

Bertrand Braunschweig

Int. J. High Perform. Comput. Appl., 2011

Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community.

[BibT_eX]

[DOI]

Stephen Taylor McNally

Sudhakar Yalamanchili

Comput. Sci. Eng., 2011

MTAAP Introduction.

[BibT_eX]

[DOI]

Luiz DeRose

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Virtual Topologies for Scalable Resource Management and Contention Attenuation in a Global Address Space Model on the Cray XT5.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Parallel Processing, 2011

Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE International Conference on Cluster Computing (CLUSTER), 2011

Network-Friendly One-Sided Communication through Multinode Cooperation on Petascale Cray XT5 Systems.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

Quantifying NUMA and contention effects in multi-GPU systems.

[BibT_eX]

[DOI]

Aparna Chandramowlishwaran

Proceedings of 4th Workshop on General Purpose Processing on Graphics Processing Units, 2011

2010

On the Path to Exascale.

[BibT_eX]

[DOI]

Int. J. Distributed Syst. Technol., 2010

Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems.

[BibT_eX]

[DOI]

Comput. Sci. Res. Dev., 2010

Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures.

[BibT_eX]

[DOI]

Abtin Rahimian

Ilya Lashuk

Shravan K. Veerapaneni

Proceedings of the Conference on High Performance Computing Networking, 2010

Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2010

Initial characterization of parallel NFS implementations.

[BibT_eX]

[DOI]

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

MTAAP 2010 Welcome.

[BibT_eX]

[DOI]

Luiz De Rose

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Efficient Zero-Copy Noncontiguous I/O for Globus on InfiniBand.

[BibT_eX]

[DOI]

Yuan Tian

Proceedings of the 39th International Conference on Parallel Processing, 2010

Maestro: Data Orchestration and Tuning for OpenCL Devices.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2010 - Parallel Processing, 16th International Euro-Par Conference, Ischia, Italy, August 31, 2010

Enabling a highly-scalable global address space model for petascale computing.

[BibT_eX]

[DOI]

Proceedings of the 7th Conference on Computing Frontiers, 2010

Toward exascale computational science with heterogeneous processing.

[BibT_eX]

[DOI]

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

The Scalable Heterogeneous Computing (SHOC) benchmark suite.

[BibT_eX]

[DOI]

Proceedings of 3rd Workshop on General Purpose Processing on Graphics Processing Units, 2010

2009

Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study.

[BibT_eX]

[DOI]

Parallel Comput., 2009

Revolutionary technologies for acceleration of emerging petascale applications.

[BibT_eX]

[DOI]

Rupak Biswas

Leonid Oliker

Parallel Comput., 2009

Design, implementation, and evaluation of transparent pNFS on Lustre.

[BibT_eX]

[DOI]

Oleg Drokin

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

A Holistic Approach for Performance Measurement and Analysis for Petascale Applications.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2009

Accelerating S3D: A GPGPU Case Study.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009, 2009

2008

Performance characteristics of biomolecular simulations on high-end systems with multi-core processors.

[BibT_eX]

[DOI]

Pratul K. Agarwal

Parallel Comput., 2008

An Evaluation of the Oak Ridge National Laboratory Cray XT3.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2008

DARPA's HPCS Program- History, Models, Tools, Languages.

[BibT_eX]

Adv. Comput., 2008

Wide-area performance profiling of 10GigE and InfiniBand technologies.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

Early evaluation of IBM BlueGene/P.

[BibT_eX]

[DOI]

Proceedings of the ACM/IEEE Conference on High Performance Computing, 2008

A Methodology for Developing High Fidelity Communication Models for Large-Scale Applications Targeted on Multicore Systems.

[BibT_eX]

[DOI]

Proceedings of the 20th International Symposium on Computer Architecture and High Performance Computing, 2008

Experimental Analysis of InfiniBand Transport Services on WAN.

[BibT_eX]

[DOI]

Nageswara S. V. Rao

Proceedings of The 2008 IEEE International Conference on Networking, 2008

Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors.

[BibT_eX]

[DOI]

Alan L. Cox

Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2008

Performance characterization and optimization of parallel I/O on the Cray XT.

[BibT_eX]

[DOI]

Sarp Oral

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Impact of multicores on large-scale molecular dynamics simulations.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

ParColl: Partitioned Collective I/O on the Cray XT.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Parallel Processing, 2008

HPC Interconnection Networks: The Key to Exascale Computing.

[BibT_eX]

[DOI]

Proceedings of the High Speed and Large Scale Scientific Computing - Selected Papers from the High Performance Computing Workshop, Cetraro, Italy, June 30, 2008

Empirical Analysis of a Large-Scale Hierarchical Storage System.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2008, 2008

Xen-Based HPC: A Parallel I/O Perspective.

[BibT_eX]

[DOI]

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2007

FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications.

[BibT_eX]

[DOI]

Simul., 2007

Throughput Improvement of Molecular Dynamics Simulations Using Reconfigurable Computing.

[BibT_eX]

[DOI]

Scalable Comput. Pract. Exp., 2007

A framework for performance analysis of Co-Array Fortran.

[BibT_eX]

[DOI]

Bernd Mohr

Luiz De Rose

Concurr. Comput. Pract. Exp., 2007

Using FPGA Devices to Accelerate Biomolecular Simulations.

[BibT_eX]

[DOI]

Computer, 2007

Performance evaluation of the cray XT3 configured with dual core opteron processors.

[BibT_eX]

[DOI]

Richard F. Barrett

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2007

Analysis of a Computational Biology Simulation Technique on Emerging Processing Architectures.

[BibT_eX]

[DOI]

Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007

An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices.

[BibT_eX]

[DOI]

Nikhil Bhatia

Proceedings of the High Performance Computing and Communications, 2007

Virtual Cluster Management with Xen.

[BibT_eX]

[DOI]

Nikhil Bhatia

Proceedings of the Euro-Par 2007 Workshops: Parallel Processing, 2007

Balancing productivity and performance on the cell broadband engine.

[BibT_eX]

[DOI]

Proceedings of the 2007 IEEE International Conference on Cluster Computing, 2007

Exploiting Lustre File Joining for Effective Collective IO.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2007), 2007

Sensitivity Analysis of Biomolecular Simulations using Symbolic Models.

[BibT_eX]

[DOI]

Nikhil Bhatia

Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, 2007

An Application Specific Memory Characterization Technique for Co-processor Accelerators.

[BibT_eX]

[DOI]

Soundarya Sivaramakrishnan

Melissa C. Smith

Proceedings of the IEEE International Conference on Application-Specific Systems, 2007

2006

Kernel-level single system image for petascale computing.

[BibT_eX]

[DOI]

ACM SIGOPS Oper. Syst. Rev., 2006

Performance evaluation of high-speed interconnects using dense communication patterns.

[BibT_eX]

[DOI]

Rod Fatoohi

Ken Kardys

Sumy Koshy

Parallel Comput., 2006

Performance characterization of molecular dynamics techniques for biomolecular simulations.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2006

Early evaluation of the Cray XT3.

[BibT_eX]

[DOI]

Thomas H. Dunigan Jr.

Mark R. Fahey

Philip C. Roth

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

A framework to develop symbolic performance models of parallel applications.

[BibT_eX]

[DOI]

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

Characterization of Scientific Workloads on Systems with Multi-Core Processors.

[BibT_eX]

[DOI]

Proceedings of the 2006 IEEE International Symposium on Workload Characterization, 2006

An Analysis of System Balance Requirements for Scientific Applications.

[BibT_eX]

[DOI]

Proceedings of the 2006 International Conference on Parallel Processing (ICPP 2006), 2006

Topic 2: Performance Prediction and Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

Hierarchical Model Validation of Symbolic Performance Models of Scientific Kernels.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2006, Parallel Processing, 12th International Euro-Par Conference, Dresden, Germany, August 28, 2006

IPMI-based Efficient Notification Framework for Large Scale Cluster Computing.

[BibT_eX]

[DOI]

Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2006), 2006

2005

Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture.

[BibT_eX]

[DOI]

Thomas H. Dunigan Jr.

James B. White III

IEEE Micro, 2005

Evaluating high-performance computers.

[BibT_eX]

[DOI]

Bronis R. de Supinski

Lynn Kissel

John May

Sheila Vaidya

Concurr. Pract. Exp., 2005

Capturing Petascale Application Characteristics with the Sequoia Toolkit.

[BibT_eX]

Proceedings of the Parallel Computing: Current & Future Issues of High-End Computing, 2005

Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis.

[BibT_eX]

[DOI]

Melissa C. Smith

Xuejun Liang

Proceedings of the 19th International Parallel and Distributed Processing Symposium (IPDPS 2005), 2005

Performance Evaluation of the SGI Altix 3700.

[BibT_eX]

[DOI]

Thomas H. Dunigan

Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization.

[BibT_eX]

[DOI]

Proceedings of the Computational Science, 2005

A Performance Measurement Infrastructure for Co-array Fortran.

[BibT_eX]

[DOI]

Bernd Mohr

Luiz De Rose

Proceedings of the Euro-Par 2005, Parallel Processing, 11th International Euro-Par Conference, Lisbon, Portugal, August 30, 2005

Balancing FPGA Resource Utilities.

[BibT_eX]

Proceedings of The 2005 International Conference on Engineering of Reconfigurable Systems and Algorithms, 2005

2004

Performance evaluation of the Cray X1 distributed shared memory architecture.

[BibT_eX]

[DOI]

Thomas H. Dunigan Jr.

Proceedings of the 12th Annual IEEE Symposium on High Performance Interconnects, 2004

Topic 2: Performance Evaluation.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2004 Parallel Processing, 2004

2003

Communication characteristics of large-scale scientific applications for contemporary cluster architectures.

[BibT_eX]

[DOI]

Frank Mueller

J. Parallel Distributed Comput., 2003

2002

Dynamic statistical profiling of communication activity in distributed applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Measurements and Modeling of Computer Systems, 2002

An empirical performance evaluation of scalable scientific applications.

[BibT_eX]

[DOI]

Andy B. Yoo

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Asserting performance expectations.

[BibT_eX]

[DOI]

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Scalable analysis techniques for microprocessor performance counter metrics.

[BibT_eX]

[DOI]

Dong H. Ahn

Burkhard D. Steinmacher-Burow

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

An overview of the BlueGene/L Supercomputer.

[BibT_eX]

[DOI]

Siddhartha Chatterjee

Karin Strauss

Christopher W. Surovic

Arun R. Umamaheshwaran

P. Verma

Pavlos Vranas

T. J. Christopher Ward

Proceedings of the 2002 ACM/IEEE conference on Supercomputing, 2002

Local Discovery of System Architecture - Application Parameter Sensitivity: An Empirical Technique for Adaptive Grid Applications.

[BibT_eX]

[DOI]

Ivan Corey

John R. Johnson

Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11 2002), 2002

2001

An Integrated Performance Visualizer for MPI/OpenMP Programs.

[BibT_eX]

[DOI]

Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

A Dynamic Tracing Mechanism for Performance Analysis of OpenMP Applications.

[BibT_eX]

[DOI]

Proceedings of the OpenMP Shared Memory Parallel Programming, 2001

Statistical scalability analysis of communication operations in distributed applications.

[BibT_eX]

[DOI]

Michael O. McCracken

Proceedings of the 2001 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'01), 2001

2000

Real-Time Performance Monitoring, Adaptive Control, and Interactive Steering of Computational Grids.

[BibT_eX]

[DOI]

Daniel A. Reed

Int. J. High Perform. Comput. Appl., 2000

Dynamic Software Testing of MPI Applications with Umpire.

[BibT_eX]

[DOI]

Bronis R. de Supinski

Proceedings of the Proceedings Supercomputing 2000, 2000

Performance analysis of distributed applications using automatic classification of communication inefficiencies.

[BibT_eX]

[DOI]

Proceedings of the 14th international conference on Supercomputing, 2000

Performance Issues in Parallel Processing Systems.

[BibT_eX]

[DOI]

Proceedings of the Performance Evaluation: Origins and Directions, 2000

1999

Techniques for high-performance computational steering.

[BibT_eX]

[DOI]

IEEE Concurr., 1999

Managing Performance Analysis with Dynamic Statistical Projection Pursuit.

[BibT_eX]

[DOI]

Daniel A. Reed

Proceedings of the ACM/IEEE Conference on Supercomputing, 1999

Experiences with Computational Steering on Existing Scientific Applications.

[BibT_eX]

Proceedings of the Ninth SIAM Conference on Parallel Processing for Scientific Computing, 1999

Optimizations for Language-Directed Computational Steering.

[BibT_eX]

[DOI]

Proceedings of the 13th International Parallel Processing Symposium / 10th Symposium on Parallel and Distributed Processing (IPPS / SPDP '99), 1999

1998

From interactive applications to distributed laboratories.

[BibT_eX]

[DOI]

IEEE Concurr., 1998

Falcon: On-line monitoring for steering parallel programs.

[BibT_eX]

[DOI]

Concurr. Pract. Exp., 1998

Techniques for Delayed Binding of Monitoring Mechanisms to Application-Specific Instrumentation Points.

[BibT_eX]

[DOI]

Proceedings of the 1998 International Conference on Parallel Processing (ICPP '98), 1998

Autopilot: Adaptive Control of Distributed Applications.

[BibT_eX]

[DOI]

Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, 1998

Computational Steering.

[BibT_eX]

[DOI]

Eileen T. Kraemer

Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences, 1998

1997

Computational Steering Annoted Bibliography.

[BibT_eX]

[DOI]

ACM SIGPLAN Notices, 1997

High Performance Computational Steering of Physical Simulations.

[BibT_eX]

[DOI]

Proceedings of the 11th International Parallel Processing Symposium (IPPS '97), 1997

1996

Models for computational steering.

[BibT_eX]

[DOI]

Proceedings of the Third International Conference on Configurable Distributed Systems, 1996

1995

Progress: A Toolkit for Interactive Program Steering.

[BibT_eX]

Proceedings of the 1995 International Conference on Parallel Processing, 1995

1994

An annotated bibliography of interactive program steering.

[BibT_eX]

[DOI]

Weiming Gu