Abhinav Bhatele

IEEE Trans. Vis. Comput. Graph., September, 2024

From Pixels to Prose: A Large Dataset of Dense Image Captions.

[BibT_eX]

[DOI]

CoRR, 2024

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

Loki: Low-Rank Keys for Efficient Sparse Attention.

[BibT_eX]

[DOI]

CoRR, 2024

Transformers Can Do Arithmetic with the Right Embeddings.

[BibT_eX]

[DOI]

CoRR, 2024

Performance-Aligned LLMs for Generating Fast Code.

[BibT_eX]

[DOI]

CoRR, 2024

An Evaluative Comparison of Performance Portability across GPU Programming Models.

[BibT_eX]

[DOI]

Joshua Hoke Davis

Pranav Sivaraman

Isaac Minn

Konstantinos Parasyris

Harshitha Menon

Giorgis Georgakoudis

CoRR, 2024

Automated Programmatic Performance Analysis of Parallel Programs.

[BibT_eX]

[DOI]

Onur Cankur

Aditya Tomar

Daniel Nichols

Katherine E. Isaacs

CoRR, 2024

A Large-Scale Epidemic Simulation Framework for Realistic Social Contact Networks.

[BibT_eX]

[DOI]

CoRR, 2024

HPC-Coder: Modeling Parallel Programs using Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

A Probabilistic Approach To Selecting Build Configurations in Package Managers.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2024

Predicting GPUDirect Benefits for HPC Workloads.

[BibT_eX]

[DOI]

Proceedings of the 32nd Euromicro International Conference on Parallel, 2024

Learning to Predict and Improve Build Successes in Package Ecosystems.

[BibT_eX]

[DOI]

Proceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories, 2024

Predicting Cross-Architecture Performance of Parallel Programs.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Can Large Language Models Write Parallel Code?

[BibT_eX]

[DOI]

Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

Relative Performance Prediction Using Few-Shot Learning.

[BibT_eX]

[DOI]

Proceedings of the 48th IEEE Annual Computers, Software, and Applications Conference, 2024

2023

Scalable Comparative Visualization of Ensembles of Call Graphs.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., March, 2023

ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems.

[BibT_eX]

[DOI]

CoRR, 2023

Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization.

[BibT_eX]

[DOI]

Zachary Sating

CoRR, 2023

Modeling Parallel Programs using Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Pipit: Enabling programmatic analysis of parallel execution traces.

[BibT_eX]

[DOI]

CoRR, 2023

Communication-minimizing Asynchronous Tensor Parallelism.

[BibT_eX]

[DOI]

Zack Sating

CoRR, 2023

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.

[BibT_eX]

[DOI]

CoRR, 2023

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Supercomputing, 2023

2022

Designing an Interactive, Notebook-Embedded, Tree Visualization to Support Exploratory Performance Analysis.

[BibT_eX]

[DOI]

CoRR, 2022

Comparative Evaluation of Call Graph Generation by Profiling Tools.

[BibT_eX]

[DOI]

Onur Cankur

Proceedings of the High Performance Computing - 37th International Conference, 2022

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Resource Utilization Aware Job Scheduling to Mitigate Performance Variability.

[BibT_eX]

[DOI]

Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021

UIUC-PPL/charm: Charm++ version 7.0.0.

[BibT_eX]

[DOI]

Dataset, October, 2021

UIUC-PPL/charm: v7.0.0-rc2.

[BibT_eX]

[DOI]

Dataset, September, 2021

UIUC-PPL/charm: v7.0.0-rc1.

[BibT_eX]

[DOI]

Dataset, June, 2021

Visualizing Hierarchical Performance Profiles of Parallel Codes Using CallFlow.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2021

How to Train Your Neural Network: A Comparative Evaluation.

[BibT_eX]

[DOI]

CoRR, 2021

Myelin: An asynchronous, message-driven parallel framework for extreme-scale deep learning.

[BibT_eX]

[DOI]

CoRR, 2021

A Simulation Study of Hardware Parameters for Future GPU-based HPC Platforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Performance, 2021

2020

UIUC-PPL/charm: v6.11.0-beta1.

[BibT_eX]

[DOI]

Dataset, October, 2020

UIUC-PPL/charm: Charm++ version 6.10.2.

[BibT_eX]

[DOI]

Dataset, August, 2020

UIUC-PPL/charm: v6.10.1.

[BibT_eX]

[DOI]

Dataset, March, 2020

UIUC-PPL/charm: v6.10.0.

[BibT_eX]

[DOI]

Dataset, February, 2020

Analytics of Longitudinal System Monitoring Data for Performance Prediction.

[BibT_eX]

[DOI]

Ian J. Costello

CoRR, 2020

Scalable Comparative Visualization of Ensembles of Call Graphs.

[BibT_eX]

[DOI]

CoRR, 2020

Usability and Performance Improvements in Hatchet.

[BibT_eX]

[DOI]

Stephanie Brink

Ian Lumsden

Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization.

[BibT_eX]

[DOI]

Harshitha Menon

Todd Gamblin

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

The Case of Performance Variability on Dragonfly-based Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

End-to-end performance modeling of distributed GPU applications.

[BibT_eX]

[DOI]

Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Predicting MPI Collective Communication Performance Using Machine Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019

UIUC-PPL/charm: v6.10.0-rc2.

[BibT_eX]

[DOI]

Dataset, October, 2019

UIUC-PPL/charm: v6.10.0-rc.

[BibT_eX]

[DOI]

Dataset, September, 2019

UIUC-PPL/charm: v6.10.0-beta1.

[BibT_eX]

[DOI]

Dataset, August, 2019

Hatchet: pruning the overgrowth in parallel profiles.

[BibT_eX]

[DOI]

Stephanie Brink

Todd Gamblin

Proceedings of the International Conference for High Performance Computing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.

[BibT_eX]

[DOI]

Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations.

[BibT_eX]

[DOI]

Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.

[BibT_eX]

[DOI]

Proceedings of the ACM International Conference on Supercomputing, 2019

Evaluating the Impact of Energy Efficient Networks on HPC Workloads.

[BibT_eX]

[DOI]

Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018

MemAxes: Visualization and Analytics for Characterizing Complex Memory Performance Behaviors.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2018

Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope.

[BibT_eX]

[DOI]

Comput. Graph. Forum, 2018

Mitigating inter-job interference using adaptive flow-aware routing.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Evaluation of an interference-free node allocation policy on fat-tree clusters.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Visual Analytics Challenges in Analyzing Calling Context Trees.

[BibT_eX]

[DOI]

Proceedings of the Programming and Performance Visualization Tools, 2018

PADDLE: Performance Analysis Using a Data-Driven Learning Environment.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Bootstrapping Parameter Space Exploration for Fast Tuning.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Supercomputing, 2018

Interference between I/O and MPI Traffic on Fat-tree Networks.

[BibT_eX]

[DOI]

Proceedings of the 47th International Conference on Parallel Processing, 2018

2017

Massively parallel first-principles simulation of electron dynamics in materials.

[BibT_eX]

[DOI]

J. Parallel Distributed Comput., 2017

Toward reliable validation of HPC network simulation models.

[BibT_eX]

[DOI]

Kwan-Liu Ma

Robert B. Ross

Proceedings of the 2017 Winter Simulation Conference, 2017

Performance modeling under resource constraints using deep transfer learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Predicting the performance impact of different fat-tree configurations.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

ScrubJay: deriving knowledge from the disarray of HPC performance data.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017

Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers.

[BibT_eX]

[DOI]

Kwan-Liu Ma

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Preliminary Performance Analysis of Multi-rail Fat-tree Networks.

[BibT_eX]

[DOI]

Robert B. Ross

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016

Ordering Traces Logically to Identify Lateness in Message Passing Programs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2016

Data-Driven Performance Modeling of Linear Solvers for Sparse Matrices.

[BibT_eX]

[DOI]

Jae-Seung Yeom

Greg Bronevetsky

Tzanio V. Kolev

Proceedings of the 7th International Workshop on Performance Modeling, 2016

VIPACT: A Visualization Interface for Analyzing Calling Context Trees.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Visual Performance Analysis, 2016

Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

Evaluating HPC networks via simulation of parallel workloads.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

A machine learning framework for performance coverage analysis of proxy applications.

[BibT_eX]

[DOI]

Tanzima Z. Islam

Martin Schulz

Todd Gamblin

Proceedings of the International Conference for High Performance Computing, 2016

LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015

Recovering logical structure from Charm++ event traces.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2015

Charm++ and MPI: Combining the Best of Both Worlds.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Identifying the Culprits Behind Network Congestion.

[BibT_eX]

[DOI]

Andrew R. Titus

Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014

Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2014

pF3D Simulations of Laser-Plasma Interactions in National Ignition Facility Experiments.

[BibT_eX]

[DOI]

Steven H. Langer

Charles H. Still

Comput. Sci. Eng., 2014

State of the Art of Performance Visualization.

[BibT_eX]

[DOI]

Proceedings of the 16th Eurographics Conference on Visualization, 2014

Visualizing the five-dimensional torus network of the IBM blue gene/Q.

[BibT_eX]

[DOI]

Proceedings of the First Workshop on Visual Performance Analysis, 2014

Maximizing Throughput on a Dragonfly Network.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

Dissecting On-Node Memory Access Performance: A Semantic Approach.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2014

RAHTM: Routing Algorithm Aware Hierarchical Task Mapping.

[BibT_eX]

[DOI]

Ahmed H. Abdel-Gawad

Mithuna Thottethodi

Dimitrios S. Nikolopoulos

Proceedings of the International Conference for High Performance Computing, 2014

Extracting logical structure and identifying stragglers in parallel execution traces.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.

[BibT_eX]

[DOI]

Martin Schulz

Lukasz Wesolowski

Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing the performance of parallel applications on a 5D torus via task mapping.

[BibT_eX]

[DOI]

Proceedings of the 21st International Conference on High Performance Computing, 2014

2013

Predicting application performance using supervised learning on communication features.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

There goes the neighborhood: performance degradation due to nearby jobs.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2013

Performance Analysis Techniques for the Exascale Co-Design Process.

[BibT_eX]

[DOI]

Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application.

[BibT_eX]

[DOI]

Ian Karlin

Jeff Keasler

Bradford L. Chamberlain

Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Scalable Molecular Dynamics with NAMD.

[BibT_eX]

Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

OpenAtom: Ab initio Molecular Dynamics for Petascale Platforms.

[BibT_eX]

Glenn J. Martyna

Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

2012

Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.

[BibT_eX]

[DOI]

IEEE Trans. Vis. Comput. Graph., 2012

Mapping applications with collectives over sub-communicators on torus networks.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

Novel views of performance data to analyze large-scale adaptive applications.

[BibT_eX]

[DOI]

Proceedings of the SC Conference on High Performance Computing Networking, 2012

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems.

[BibT_eX]

[DOI]

Laércio Lima Pilla

Christiane Pousa Ribeiro

Daniel Cordeiro

Chao Mei

Philippe Olivier Alexandre Navaux

François Broquedis

Jean-François Méhaut

Proceedings of the 41st International Conference on Parallel Processing, 2012

2011

NAMD (NAnoscale Molecular Dynamics).

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Topology Aware Task Mapping.

[BibT_eX]

[DOI]

Proceedings of the Encyclopedia of Parallel Computing, 2011

Periodic hierarchical load balancing for large supercomputers.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2011

Optimizing communication for Charm++ applications by reducing network contention.

[BibT_eX]

[DOI]

Concurr. Comput. Pract. Exp., 2011

Improving communication performance in dense linear algebra via topology aware collectives.

[BibT_eX]

[DOI]

Edgar Solomonik

James Demmel

Proceedings of the Conference on High Performance Computing Networking, 2011

Avoiding hot-spots on two-level direct networks.

[BibT_eX]

[DOI]

Proceedings of the Conference on High Performance Computing Networking, 2011

Creating a Tool Set for Optimizing Topology-Aware Node Mappings.

[BibT_eX]

[DOI]

Proceedings of the Tools for High Performance Computing 2011, 2011

Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System.

[BibT_eX]

[DOI]

Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies.

[BibT_eX]

[DOI]

Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters.

[BibT_eX]

[DOI]

Vivek Kale

William D. Gropp

Proceedings of the 18th International Conference on High Performance Computing, 2011

2010

Automating Topology Aware Mapping for Supercomputers

[BibT_eX]

[DOI]

PhD thesis, 2010

Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar.

[BibT_eX]

[DOI]

Int. J. High Perform. Comput. Appl., 2010

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 39th International Conference on Parallel Processing, 2010

Automated mapping of regular communication graphs on mesh interconnects.

[BibT_eX]

[DOI]

Proceedings of the 2010 International Conference on High Performance Computing, 2010

2009

Quantifying Network Contention on Large Parallel Machines.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2009

Topology aware task mapping techniques: an api and case study.

[BibT_eX]

[DOI]

Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

An evaluative study on the effect of contention on message latencies in large supercomputers.

[BibT_eX]

[DOI]

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Dynamic topology aware load balancing algorithms for molecular dynamics applications.

[BibT_eX]

[DOI]

Sameer Kumar

Proceedings of the 23rd international conference on Supercomputing, 2009

CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm.

[BibT_eX]

[DOI]

Proceedings of the ICPPW 2009, 2009

A Case Study of Communication Optimizations on 3D Mesh Interconnects.

[BibT_eX]

[DOI]

Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008

Benefits of Topology Aware Mapping for Mesh Interconnects.

[BibT_eX]

[DOI]

Parallel Process. Lett., 2008

Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2008

Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2008

Overcoming scaling challenges in biomolecular simulations across multiple platforms.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Application-specific topology-aware mapping for three dimensional topologies.

[BibT_eX]

[DOI]

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007

A Selective Pro ling Tool: Towards Automatic Performance Tuning.

[BibT_eX]

[DOI]