Khaled Z. Ibrahim

Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training.

[BibT_eX]

[DOI]

Jonghyun Bae

Jong Youl Choi

Massimiliano Lupo Pasini

Kshitij Mehta

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023

Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection.

[BibT_eX]

[DOI]

Appl. Netw. Sci., December, 2023

2022

Enhancing scalability of a matrix-free eigensolver for studying many-body localization.

[BibT_eX]

[DOI]

Roel Van Beeumen

Gregory D. Kahanamoku-Meyer

Norman Y. Yao

Chao Yang

Int. J. High Perform. Comput. Appl., 2022

ML-based Performance Portability for Time-Dependent Density Functional Theory in HPC Environments.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs.

[BibT_eX]

[DOI]

Chao Yang

Pieter Maris

Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads.

[BibT_eX]

[DOI]

Leonid Oliker

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

Architectural Requirements for Deep Learning Workloads in HPC Environments.

[BibT_eX]

[DOI]

Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures.

[BibT_eX]

[DOI]

Hadia Ahmed

David B. Williams-Young

Chao Yang

Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

CSPACER: A Reduced API Set Runtime for the Space Consistency Model.

[BibT_eX]

[DOI]

Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021

2020

Tuning floating-point precision using dynamic program information and temporal locality.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2020

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019

Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2019

Performance analysis of deep learning workloads using roofline trajectories.

[BibT_eX]

[DOI]

M. Haseeb Javed

Xiaoyi Lu

CCF Trans. High Perform. Comput., 2019

Toward a Programmable Analysis and Visualization Framework for Interactive Performance Analytics.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Optimizing Breadth-First Search at Scale Using Hardware-Accelerated Space Consistency.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories.

[BibT_eX]

[DOI]

Samuel Williams

Leonid Oliker

Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018

Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis.

[BibT_eX]

[DOI]

Samuel Williams

Leonid Oliker

Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017

Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2017

Reaching bandwidth saturation using transparent injection parallelization.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2017

APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks.

[BibT_eX]

[DOI]

George Michelogiannakis

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

Scaling Spark on Lustre.

[BibT_eX]

[DOI]

Proceedings of the High Performance Computing, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.

[BibT_eX]

[DOI]

Carlos Rosales-Fernandez

Timothy J. Williams

Proceedings of the International Conference for High Performance Computing, 2016

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes.

[BibT_eX]

[DOI]

Farzad Fatollahi-Fard

David Donofrio

John Shalf

Proceedings of the Second International Symposium on Memory Systems, 2016

Scaling Spark on HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015

Exploiting communication concurrency on high performance computing systems.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

2014

The Case for Partitioning Virtual Machines on Multicore Architectures.

[BibT_eX]

[DOI]

Steven A. Hofmeyr

Costin Iancu

IEEE Trans. Parallel Distributed Syst., 2014

Efficient Interoperability of OpenSHMEM on Multicore Architectures.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.

[BibT_eX]

[DOI]

Katherine A. Yelick

Proceedings of the 2014 International Conference on Supercomputing, 2014

Analysis and tuning of libtensor framework on multicore architectures.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

2013

Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2013

Kinetic turbulence simulations at extreme scale on leadership-class systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

2012

Code Development of High-Performance Applications for Power-Efficient Architectures.

[BibT_eX]

[DOI]

Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.

[BibT_eX]

[DOI]

Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Congestion avoidance on manycore high performance computing systems.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Supercomputing, 2012

Concurrent Phase Classification for Accelerating MPSoC Simulation.

[BibT_eX]

[DOI]

Proceedings of the ARCS 2012 Workshops, 28. Februar - 2. März 2012, München, Germany, 2012

2011

Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms.

[BibT_eX]

[DOI]

Parallel Comput., 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Optimized pre-copy live migration for memory intensive applications.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines.

[BibT_eX]

[DOI]

Steven A. Hofmeyr

Costin Iancu

Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010

Parallel application sampling for accelerating MPSoC simulation.

[BibT_eX]

[DOI]

Des. Autom. Embed. Syst., 2010

Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions.

[BibT_eX]

[DOI]

Erich Strohmaier

Proceedings of the 39th International Conference on Parallel Processing, 2010

Bridging the gap between complex software paradigms and power-efficient parallel architectures.

[BibT_eX]

[DOI]

Proceedings of the International Green Computing Conference 2010, 2010

2009

Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC.

[BibT_eX]

[DOI]

Trans. High Perform. Embed. Archit. Compil., 2009

Efficient SIMDization and data management of the Lattice QCD computation on the Cell Broadband Engine.

[BibT_eX]

[DOI]

François Bodin

Sci. Program., 2009

2008

Fine-grained parallelization of lattice QCD kernel routine on GPUs.

[BibT_eX]

[DOI]

François Bodin

Olivier Pène

J. Parallel Distributed Comput., 2008

Implementing Wilson-Dirac operator on the cell broadband engine.

[BibT_eX]

[DOI]

François Bodin

Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Multi-granularity sampling for simulating concurrent heterogeneous applications.

[BibT_eX]

[DOI]

Proceedings of the 2008 International Conference on Compilers, 2008

2007

Adaptive Sampling for Efficient MPSoC Architecture Simulation.

[BibT_eX]

[DOI]

Proceedings of the 15th International Symposium on Modeling, 2007

2005

Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Efficient Architectural Support for Secure Bus-Based Shared Memory Multiprocessor.

[BibT_eX]

[DOI]

Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2003

Extending OpenMP to Support Slipstream Execution Mode.

[BibT_eX]

[DOI]

Gregory T. Byrd

Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Slipstream Execution Mode for CMP-Based Multiprocessors.

[BibT_eX]

[DOI]

Gregory T. Byrd

Eric Rotenberg

Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2001

On the Exploitation of Value Predication and Producer Identification to Reduce Barrier Synchronization Time.

[BibT_eX]

[DOI]