Murali Emani

Krishna Teja Chitty-Venkata

Xipeng Shen

Ali Jannesari

CoRR, 2022

FAIR for AI: An interdisciplinary, international, inclusive, and diverse community building perspective.

[BibT_eX]

[DOI]

CoRR, 2022

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory.

[BibT_eX]

[DOI]

IEEE Access, 2022

Neural Architecture Search for Transformers: A Survey.

[BibT_eX]

[DOI]

Venkatram Vishwanath

Arun K. Somani

IEEE Access, 2022

Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study.

[BibT_eX]

[DOI]

Chunhua Liao

Winson Chen

Hailu Xu

Proceedings of the Fourth International Conference on Transdisciplinary AI, 2022

AI Benchmarking for Science: Efforts from the MLCommons Science Working Group.

[BibT_eX]

[DOI]

Christine R. Kirkpatrick

Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022

A Comprehensive Evaluation of Novel AI Accelerators for Deep Learning Workloads.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU.

[BibT_eX]

[DOI]

Zhen Xie

Siddhisanket Raskar

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC.

[BibT_eX]

[DOI]

Krishna Teja Chitty-Venkata

Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools, 2022

Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures.

[BibT_eX]

[DOI]

Venkatram Vishwanath

Arun K. Somani

Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines.

[BibT_eX]

[DOI]

Patrick J. Flynn

Proceedings of the Software Architecture. ECSA 2022 Tracks and Workshops, 2022

Early Experience with Transformer-Based Similarity Analysis for DataRaceBench.

[BibT_eX]

[DOI]

Winson Chen

Chunhua Liao

Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler.

[BibT_eX]

[DOI]

Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

Toward an In-Depth Analysis of Multifidelity High Performance Computing Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021

Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture.

[BibT_eX]

[DOI]

Volodymyr V. Kindratenko

Anne C. Elster

Comput. Sci. Eng., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

CoRR, 2021

Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms.

[BibT_eX]

[DOI]

Proceedings of the PASC '21: Platform for Advanced Scientific Computing Conference, 2021

HPCFAIR: Enabling FAIR AI for HPC Applications.

[BibT_eX]

[DOI]

Xipeng Shen

Barbara M. Chapman

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing.

[BibT_eX]

[DOI]

Chunhua Liao

Gaurav Verma

Md Abdullah Shahneous Bari

Zifan Nan

Xipeng Shen

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

2020

EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2020

2019

Machine Learning Guided Optimal Use of GPU Unified Memory.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs.

[BibT_eX]

[DOI]

Proceedings of the 3rd IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, 2019

2018

Data Placement Optimization in GPU Memory Hierarchy using Predictive Modeling.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

Is Data Placement Optimization Still Relevant on Newer GPUs?

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications.

[BibT_eX]

[DOI]

Proceedings of the 25th European MPI Users' Group Meeting, 2018

Bootstrapping Parameter Space Exploration for Fast Tuning.

[BibT_eX]

[DOI]

Jayaraman J. Thiagarajan

Proceedings of the 32nd International Conference on Supercomputing, 2018

2016

Mapping Medley: Adaptive Parallelism Mapping with Varying Optimization Goals.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2016

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding.

[BibT_eX]

[DOI]

Govind Sreekar Shenoy

Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015

Adaptive parallelism mapping in dynamic environments using machine learning.

[BibT_eX]

[DOI]

PhD thesis, 2015

Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments.

[BibT_eX]

[DOI]

Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2014

Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Languages and Compilers for Parallel Computing, 2014

2013

Smart, adaptive mapping of parallelism in the presence of external workload.

[BibT_eX]

[DOI]

Zheng Wang