Abhinav Bhatele

Orcid: 0000-0003-3069-3701

According to our database1, Abhinav Bhatele authored at least 132 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Design Concerns for Integrated Scripting and Interactive Visualization in Notebook Environments.
IEEE Trans. Vis. Comput. Graph., September, 2024

From Pixels to Prose: A Large Dataset of Dense Image Captions.
CoRR, 2024

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs.
CoRR, 2024

Loki: Low-Rank Keys for Efficient Sparse Attention.
CoRR, 2024

Transformers Can Do Arithmetic with the Right Embeddings.
CoRR, 2024

Performance-Aligned LLMs for Generating Fast Code.
CoRR, 2024

An Evaluative Comparison of Performance Portability across GPU Programming Models.
CoRR, 2024

Automated Programmatic Performance Analysis of Parallel Programs.
CoRR, 2024

A Large-Scale Epidemic Simulation Framework for Realistic Social Contact Networks.
CoRR, 2024

HPC-Coder: Modeling Parallel Programs using Large Language Models.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

Predicting GPUDirect Benefits for HPC Workloads.
Proceedings of the 32nd Euromicro International Conference on Parallel, 2024

Learning to Predict and Improve Build Successes in Package Ecosystems.
Proceedings of the 21st IEEE/ACM International Conference on Mining Software Repositories, 2024

Predicting Cross-Architecture Performance of Parallel Programs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Can Large Language Models Write Parallel Code?
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, 2024

Relative Performance Prediction Using Few-Shot Learning.
Proceedings of the 48th IEEE Annual Computers, Software, and Applications Conference, 2024

2023
Scalable Comparative Visualization of Ensembles of Call Graphs.
IEEE Trans. Vis. Comput. Graph., March, 2023

ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems.
CoRR, 2023

Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization.
CoRR, 2023

Modeling Parallel Programs using Large Language Models.
CoRR, 2023

Pipit: Enabling programmatic analysis of parallel execution traces.
CoRR, 2023

Communication-minimizing Asynchronous Tensor Parallelism.
CoRR, 2023

A Novel Tensor-Expert Hybrid Parallelism Approach to Scale Mixture-of-Experts Training.
CoRR, 2023

Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Porting a Computational Fluid Dynamics Code with AMR to Large-scale GPU Platforms.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training.
Proceedings of the 37th International Conference on Supercomputing, 2023

2022
Designing an Interactive, Notebook-Embedded, Tree Visualization to Support Exploratory Performance Analysis.
CoRR, 2022

Comparative Evaluation of Call Graph Generation by Profiling Tools.
Proceedings of the High Performance Computing - 37th International Conference, 2022

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

Resource Utilization Aware Job Scheduling to Mitigate Performance Variability.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021



Visualizing Hierarchical Performance Profiles of Parallel Codes Using CallFlow.
IEEE Trans. Vis. Comput. Graph., 2021

How to Train Your Neural Network: A Comparative Evaluation.
CoRR, 2021

Myelin: An asynchronous, message-driven parallel framework for extreme-scale deep learning.
CoRR, 2021

A Simulation Study of Hardware Parameters for Future GPU-based HPC Platforms.
Proceedings of the IEEE International Performance, 2021

2020




Analytics of Longitudinal System Monitoring Data for Performance Prediction.
CoRR, 2020

Scalable Comparative Visualization of Ensembles of Call Graphs.
CoRR, 2020

Usability and Performance Improvements in Hatchet.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools and Workshop on Programming and Performance Visualization Tools, 2020

Auto-tuning Parameter Choices in HPC Applications using Bayesian Optimization.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

The Case of Performance Variability on Dragonfly-based Systems.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

End-to-end performance modeling of distributed GPU applications.
Proceedings of the ICS '20: 2020 International Conference on Supercomputing, 2020

Predicting MPI Collective Communication Performance Using Machine Learning.
Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019



Hatchet: pruning the overgrowth in parallel profiles.
Proceedings of the International Conference for High Performance Computing, 2019

Optimizing computation-communication overlap in asynchronous task-based programs: poster.
Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2019

Analyzing Cost-Performance Tradeoffs of HPC Network Designs under Different Constraints using Simulations.
Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation, 2019

Optimizing computation-communication overlap in asynchronous task-based programs.
Proceedings of the ACM International Conference on Supercomputing, 2019

Evaluating the Impact of Energy Efficient Networks on HPC Workloads.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

2018
MemAxes: Visualization and Analytics for Characterizing Complex Memory Performance Behaviors.
IEEE Trans. Vis. Comput. Graph., 2018

Interactive Investigation of Traffic Congestion on Fat-Tree Networks Using TreeScope.
Comput. Graph. Forum, 2018

Mitigating inter-job interference using adaptive flow-aware routing.
Proceedings of the International Conference for High Performance Computing, 2018

Evaluation of an interference-free node allocation policy on fat-tree clusters.
Proceedings of the International Conference for High Performance Computing, 2018

Visual Analytics Challenges in Analyzing Calling Context Trees.
Proceedings of the Programming and Performance Visualization Tools, 2018

PADDLE: Performance Analysis Using a Data-Driven Learning Environment.
Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium, 2018

Bootstrapping Parameter Space Exploration for Fast Tuning.
Proceedings of the 32nd International Conference on Supercomputing, 2018

Interference between I/O and MPI Traffic on Fat-tree Networks.
Proceedings of the 47th International Conference on Parallel Processing, 2018

2017
Massively parallel first-principles simulation of electron dynamics in materials.
J. Parallel Distributed Comput., 2017

Toward reliable validation of HPC network simulation models.
Proceedings of the 2017 Winter Simulation Conference, 2017

Performance modeling under resource constraints using deep transfer learning.
Proceedings of the International Conference for High Performance Computing, 2017

Predicting the performance impact of different fat-tree configurations.
Proceedings of the International Conference for High Performance Computing, 2017

ScrubJay: deriving knowledge from the disarray of HPC performance data.
Proceedings of the International Conference for High Performance Computing, 2017

Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference.
Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium, 2017

Accelerating Big Data Infrastructure and Applications (Ongoing Collaboration).
Proceedings of the 37th IEEE International Conference on Distributed Computing Systems Workshops, 2017

Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers.
Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

Preliminary Performance Analysis of Multi-rail Fat-tree Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Ordering Traces Logically to Identify Lateness in Message Passing Programs.
IEEE Trans. Parallel Distributed Syst., 2016

Data-Driven Performance Modeling of Linear Solvers for Sparse Matrices.
Proceedings of the 7th International Workshop on Performance Modeling, 2016

VIPACT: A Visualization Interface for Analyzing Calling Context Trees.
Proceedings of the Third Workshop on Visual Performance Analysis, 2016

Characterizing parallel scientific applications on commodity clusters: an empirical study of a tapered fat-tree.
Proceedings of the International Conference for High Performance Computing, 2016

Evaluating HPC networks via simulation of parallel workloads.
Proceedings of the International Conference for High Performance Computing, 2016

A machine learning framework for performance coverage analysis of proxy applications.
Proceedings of the International Conference for High Performance Computing, 2016

LibPowerMon: A Lightweight Profiling Framework to Profile Program Context and System-Level Metrics.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers.
Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium, 2016

2015
Recovering logical structure from Charm++ event traces.
Proceedings of the International Conference for High Performance Computing, 2015

Charm++ and MPI: Combining the Best of Both Worlds.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Identifying the Culprits Behind Network Congestion.
Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium, 2015

Preliminary Evaluation of a Parallel Trace Replay Tool for HPC Network Simulations.
Proceedings of the Euro-Par 2015: Parallel Processing Workshops, 2015

2014
Combing the Communication Hairball: Visualizing Parallel Execution Traces using Logical Time.
IEEE Trans. Vis. Comput. Graph., 2014

pF3D Simulations of Laser-Plasma Interactions in National Ignition Facility Experiments.
Comput. Sci. Eng., 2014

State of the Art of Performance Visualization.
Proceedings of the 16th Eurographics Conference on Visualization, 2014

Visualizing the five-dimensional torus network of the IBM blue gene/Q.
Proceedings of the First Workshop on Visual Performance Analysis, 2014

Maximizing Throughput on a Dragonfly Network.
Proceedings of the International Conference for High Performance Computing, 2014

Dissecting On-Node Memory Access Performance: A Semantic Approach.
Proceedings of the International Conference for High Performance Computing, 2014

RAHTM: Routing Algorithm Aware Hierarchical Task Mapping.
Proceedings of the International Conference for High Performance Computing, 2014

Extracting logical structure and identifying stragglers in parallel execution traces.
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2014

Overcoming the Scalability Challenges of Epidemic Simulations on Blue Waters.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

Optimizing the performance of parallel applications on a 5D torus via task mapping.
Proceedings of the 21st International Conference on High Performance Computing, 2014

2013
Predicting application performance using supervised learning on communication features.
Proceedings of the International Conference for High Performance Computing, 2013

There goes the neighborhood: performance degradation due to nearby jobs.
Proceedings of the International Conference for High Performance Computing, 2013

Performance Analysis Techniques for the Exascale Co-Design Process.
Proceedings of the Parallel Computing: Accelerating Computational Science and Engineering (CSE), 2013

Exploring Traditional and Emerging Parallel Programming Models Using a Proxy Application.
Proceedings of the 27th IEEE International Symposium on Parallel and Distributed Processing, 2013

Scalable Molecular Dynamics with NAMD.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

OpenAtom: Ab initio Molecular Dynamics for Petascale Platforms.
Proceedings of the Parallel Science and Engineering Applications - The Charm++ Approach., 2013

2012
Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.
IEEE Trans. Vis. Comput. Graph., 2012

Mapping applications with collectives over sub-communicators on torus networks.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

Novel views of performance data to analyze large-scale adaptive applications.
Proceedings of the SC Conference on High Performance Computing Networking, 2012

A Hierarchical Approach for Load Balancing on Parallel Multi-core Systems.
Proceedings of the 41st International Conference on Parallel Processing, 2012

2011
NAMD (NAnoscale Molecular Dynamics).
Proceedings of the Encyclopedia of Parallel Computing, 2011

Topology Aware Task Mapping.
Proceedings of the Encyclopedia of Parallel Computing, 2011

Periodic hierarchical load balancing for large supercomputers.
Int. J. High Perform. Comput. Appl., 2011

Optimizing communication for Charm++ applications by reducing network contention.
Concurr. Comput. Pract. Exp., 2011

Improving communication performance in dense linear algebra via topology aware collectives.
Proceedings of the Conference on High Performance Computing Networking, 2011

Avoiding hot-spots on two-level direct networks.
Proceedings of the Conference on High Performance Computing Networking, 2011

Creating a Tool Set for Optimizing Topology-Aware Node Mappings.
Proceedings of the Tools for High Performance Computing 2011, 2011

Architectural Constraints to Attain 1 Exaflop/s for Three Scientific Application Classes.
Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, 2011

Simulation-Based Performance Analysis and Tuning for a Two-Level Directly Connected System.
Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, 2011

Heuristic-Based Techniques for Mapping Irregular Communication Graphs to Mesh Topologies.
Proceedings of the 13th IEEE International Conference on High Performance Computing & Communication, 2011

Weighted locality-sensitive scheduling for mitigating noise on multi-core clusters.
Proceedings of the 18th International Conference on High Performance Computing, 2011

2010
Automating Topology Aware Mapping for Supercomputers
PhD thesis, 2010

Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar.
Int. J. High Perform. Comput. Appl., 2010

Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Automated mapping of regular communication graphs on mesh interconnects.
Proceedings of the 2010 International Conference on High Performance Computing, 2010

2009
Quantifying Network Contention on Large Parallel Machines.
Parallel Process. Lett., 2009

Topology aware task mapping techniques: an api and case study.
Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2009

An evaluative study on the effect of contention on message latencies in large supercomputers.
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

Dynamic topology aware load balancing algorithms for molecular dynamics applications.
Proceedings of the 23rd international conference on Supercomputing, 2009

CkDirect: Unsynchronized One-Sided Communication in a Message-Driven Paradigm.
Proceedings of the ICPPW 2009, 2009

A Case Study of Communication Optimizations on 3D Mesh Interconnects.
Proceedings of the Euro-Par 2009 Parallel Processing, 2009

2008
Benefits of Topology Aware Mapping for Mesh Interconnects.
Parallel Process. Lett., 2008

Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system.
IBM J. Res. Dev., 2008

Fine-grained parallelization of the Car - Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer.
IBM J. Res. Dev., 2008

Overcoming scaling challenges in biomolecular simulations across multiple platforms.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Application-specific topology-aware mapping for three dimensional topologies.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

2007
A Selective Pro ling Tool: Towards Automatic Performance Tuning.
Proceedings of the 21th International Parallel and Distributed Processing Symposium (IPDPS 2007), 2007


  Loading...