2024
Design Concerns for Integrated Scripting and Interactive Visualization in Notebook Environments.
IEEE Trans. Vis. Comput. Graph., September, 2024
HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages.
CoRR, 2024
From Pixels to Prose: A Large Dataset of Dense Image Captions.
CoRR, 2024
Performance-Aligned LLMs for Generating Fast Code.
CoRR, 2024
An Evaluative Comparison of Performance Portability across GPU Programming Models.
CoRR, 2024
Automated Programmatic Performance Analysis of Parallel Programs.
CoRR, 2024
A Large-Scale Epidemic Simulation Framework for Realistic Social Contact Networks.
CoRR, 2024
HPC-Coder: Modeling Parallel Programs using Large Language Models.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the International Conference for High Performance Computing, 2024
A Probabilistic Approach To Selecting Build Configurations in Package Managers.
Proceedings of the International Conference for High Performance Computing, 2024
Predicting GPUDirect Benefits for HPC Workloads.
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024
Loki: Low-rank Keys for Efficient Sparse Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Transformers Can Do Arithmetic with the Right Embeddings.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Learning to Predict and Improve Build Successes in Package Ecosystems.
Proceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories, 2024
Predicting Cross-Architecture Performance of Parallel Programs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Can Large Language Models Write Parallel Code?
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024
ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems.
Proceedings of the 31st IEEE International Conference on High Performance Computing, 2024
Relative Performance Prediction Using Few-Shot Learning.
Proceedings of the 48th IEEE Annual Computers, Software, and Applications Conference, 2024
2023
Scalable Comparative Visualization of Ensembles of Call Graphs.
IEEE Trans. Vis. Comput. Graph., March, 2023
Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization.
CoRR, 2023
Modeling Parallel Programs using Large Language Models.
CoRR, 2023
Pipit: Enabling programmatic analysis of parallel execution traces.
CoRR, 2023
Communication-minimizing Asynchronous Tensor Parallelism.
CoRR, 2023
A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023
2022
Designing an Interactive, Notebook-Embedded, Tree Visualization to Support Exploratory Performance Analysis.
CoRR, 2022
Comparative Evaluation of Call Graph Generation by Profiling Tools.
Proceedings of the High Performance Computing - 37th International Conference, 2022
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
Resource Utilization Aware Job Scheduling to Mitigate Performance Variability.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022
2021
UIUC-PPL/charm: Charm++ version 7.0.0.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, October, 2021
UIUC-PPL/charm: v7.0.0-rc2.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, September, 2021
UIUC-PPL/charm: v7.0.0-rc1.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, June, 2021
Visualizing Hierarchical Performance Profiles of Parallel Codes Using CallFlow.
IEEE Trans. Vis. Comput. Graph., 2021
How to Train Your Neural Network: A Comparative Evaluation.
CoRR, 2021
Myelin: An asynchronous, message-driven parallel framework for extreme-scale deep learning.
CoRR, 2021
A Simulation Study of Hardware Parameters for Future GPU-based HPC Platforms.
Proceedings of the IEEE International Performance, 2021
2020
UIUC-PPL/charm: v6.11.0-beta1.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, October, 2020
UIUC-PPL/charm: Charm++ version 6.10.2.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, August, 2020
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, March, 2020
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, February, 2020
Analytics of Longitudinal System Monitoring Data for Performance Prediction.
CoRR, 2020
Scalable Comparative Visualization of Ensembles of Call Graphs.
CoRR, 2020
Usability and Performance Improvements in Hatchet.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020
Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
The Case of Performance Variability on Dragonfly-based Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
End-to-end performance modeling of distributed GPU applications.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020
Predicting MPI Collective Communication Performance Using Machine Learning.
Proceedings of the IEEE International Conference on Cluster Computing, 2020
2019
UIUC-PPL/charm: v6.10.0-rc2.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, October, 2019
UIUC-PPL/charm: v6.10.0-rc.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, September, 2019
UIUC-PPL/charm: v6.10.0-beta1.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Dataset, August, 2019
Hatchet: pruning the overgrowth in parallel profiles.
Proceedings of the International Conference for High Performance Computing, 2019
Optimizing computation-communication overlap in asynchronous task-based programs: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019
Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations.
Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2019
Optimizing computation-communication overlap in asynchronous task-based programs.
Proceedings of the ACM International Conference on Supercomputing, 2019
Evaluating the Impact of Energy Efficient Networks on HPC Workloads.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019
2018
MemAxes: Visualization and Analytics for Characterizing Complex Memory Performance Behaviors.
IEEE Trans. Vis. Comput. Graph., 2018
Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope.
Comput. Graph. Forum, 2018
Mitigating inter-job interference using adaptive flow-aware routing.
Proceedings of the International Conference for High Performance Computing, 2018
Evaluation of an interference-free node allocation policy on fat-tree clusters.
Proceedings of the International Conference for High Performance Computing, 2018
Visual Analytics Challenges in Analyzing Calling Context Trees.
Proceedings of the Programming and Performance Visualization Tools, 2018
PADDLE: Performance Analysis Using a Data-Driven Learning Environment.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018
Bootstrapping Parameter Space Exploration for Fast Tuning.
Proceedings of the 32nd International Conference on Supercomputing, 2018
Interference between I/O and MPI Traffic on Fat-tree Networks.
Proceedings of the 47th International Conference on Parallel Processing, 2018
2017
Massively parallel first-principles simulation of electron dynamics in materials.
J. Parallel Distributed Comput., 2017
Toward reliable validation of HPC network simulation models.
Proceedings of the 2017 Winter Simulation Conference, 2017
Performance modeling under resource constraints using deep transfer learning.
Proceedings of the International Conference for High Performance Computing, 2017
Predicting the performance impact of different fat-tree configurations.
Proceedings of the International Conference for High Performance Computing, 2017
ScrubJay: deriving knowledge from the disarray of HPC performance data.
Proceedings of the International Conference for High Performance Computing, 2017
Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017
Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017
Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017
Preliminary Performance Analysis of Multi-rail Fat-tree Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017
2016
Ordering Traces Logically to Identify Lateness in Message Passing Programs.
IEEE Trans. Parallel Distributed Syst., 2016
Data-Driven Performance Modeling of Linear Solvers for Sparse Matrices.
Proceedings of the 7th International Workshop on Performance Modeling, 2016
VIPACT: A Visualization Interface for Analyzing Calling Context Trees.
Proceedings of the Third Workshop on Visual Performance Analysis, 2016
Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree.
Proceedings of the International Conference for High Performance Computing, 2016
Evaluating HPC networks via simulation of parallel workloads.
Proceedings of the International Conference for High Performance Computing, 2016
A machine learning framework for performance coverage analysis of proxy applications.
Proceedings of the International Conference for High Performance Computing, 2016
LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016
Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016
2015
Recovering logical structure from Charm++ event traces.
Proceedings of the International Conference for High Performance Computing, 2015
Charm++ and MPI: Combining the Best of Both Worlds.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Identifying the Culprits Behind Network Congestion.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015
Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015
2014
Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time.
IEEE Trans. Vis. Comput. Graph., 2014
pF3D Simulations of Laser-Plasma Interactions in National Ignition Facility Experiments.
Comput. Sci. Eng., 2014
State of the Art of Performance Visualization.
Proceedings of the 16th Eurographics Conference on Visualization, 2014
Visualizing the five-dimensional torus network of the IBM blue gene/Q.
Proceedings of the First Workshop on Visual Performance Analysis, 2014
Maximizing Throughput on a Dragonfly Network.
Proceedings of the International Conference for High Performance Computing, 2014
Dissecting On-Node Memory Access Performance: A Semantic Approach.
Proceedings of the International Conference for High Performance Computing, 2014
RAHTM: Routing Algorithm Aware Hierarchical Task Mapping.
Proceedings of the International Conference for High Performance Computing, 2014
Extracting logical structure and identifying stragglers in parallel execution traces.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014
Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014
Optimizing the performance of parallel applications on a 5D torus via task mapping.
Proceedings of the 21st International Conference on High Performance Computing, 2014
2013
Predicting application performance using supervised learning on communication features.
Proceedings of the International Conference for High Performance Computing, 2013
There goes the neighborhood: performance degradation due to nearby jobs.
Proceedings of the International Conference for High Performance Computing, 2013
Performance Analysis Techniques for the Exascale Co-Design Process.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013
Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013
Scalable Molecular Dynamics with NAMD.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013
OpenAtom: Ab initio Molecular Dynamics for Petascale Platforms.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013
2012
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.
IEEE Trans. Vis. Comput. Graph., 2012
Mapping applications with collectives over sub-communicators on torus networks.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the SC Conference on High Performance Computing Networking, 2012
Novel views of performance data to analyze large-scale adaptive applications.
Proceedings of the SC Conference on High Performance Computing Networking, 2012
A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems.
Proceedings of the 41st International Conference on Parallel Processing, 2012
2011
NAMD (NAnoscale Molecular Dynamics).
Proceedings of the Encyclopedia of Parallel Computing, 2011
Topology Aware Task Mapping.
Proceedings of the Encyclopedia of Parallel Computing, 2011
Periodic hierarchical load balancing for large supercomputers.
Int. J. High Perform. Comput. Appl., 2011
Optimizing communication for Charm++ applications by reducing network contention.
Concurr. Comput. Pract. Exp., 2011
Improving communication performance in dense linear algebra via topology aware collectives.
Proceedings of the Conference on High Performance Computing Networking, 2011
Avoiding hot-spots on two-level direct networks.
Proceedings of the Conference on High Performance Computing Networking, 2011
Creating a Tool Set for Optimizing Topology-Aware Node Mappings.
Proceedings of the Tools for High Performance Computing 2011, 2011
Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011
Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011
Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011
Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters.
Proceedings of the 18th International Conference on High Performance Computing, 2011
2010
Automating Topology Aware Mapping for Supercomputers
PhD thesis, 2010
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar.
Int. J. High Perform. Comput. Appl., 2010
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers.
Proceedings of the 39th International Conference on Parallel Processing, 2010
Automated mapping of regular communication graphs on mesh interconnects.
Proceedings of the 2010 International Conference on High Performance Computing, 2010
2009
Quantifying Network Contention on Large Parallel Machines.
Parallel Process. Lett., 2009
Topology aware task mapping techniques: an api and case study.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009
An evaluative study on the effect of contention on message latencies in large supercomputers.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
Dynamic topology aware load balancing algorithms for molecular dynamics applications.
Proceedings of the 23rd international conference on Supercomputing, 2009
CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm.
Proceedings of the ICPPW 2009, 2009
A Case Study of Communication Optimizations on 3D Mesh Interconnects.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009
2008
Benefits of Topology Aware Mapping for Mesh Interconnects.
Parallel Process. Lett., 2008
Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system.
IBM J. Res. Dev., 2008
Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.
IBM J. Res. Dev., 2008
Overcoming scaling challenges in biomolecular simulations across multiple platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
Application-specific topology-aware mapping for three dimensional topologies.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
2007
A Selective Pro ling Tool: Towards Automatic Performance Tuning.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007