We stand with Ukraine

We stand with Ukraine

Jim M. Brandt

Orcid: 0000-0002-8605-5795

Affiliations:

Sandia National Laboratories

According to our database¹, Jim M. Brandt authored at least 55 papers between 2005 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning.

[BibT_eX]

[DOI]

,

,

Benjamin Schwaller

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., April, 2024

Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku Supercomputer.

[BibT_eX]

[DOI]

Ana Luisa Veroneze Solórzano

,

,

,

Fumiyoshi Shoji

,

,

Benjamin Schwaller

,

Sara Petra Walton

,

,

Proceedings of the International Conference for High Performance Computing, 2024

Workload-Adaptive Scheduling for Efficient Use of Parallel File Systems in High-Performance Computing Clusters.

[BibT_eX]

[DOI]

Alexander V. Goponenko

,

Benjamin A. Allan

,

,

Proceedings of the SC24-W: Workshops of the International Conference for High Performance Computing, 2024

Job Scheduling for HPC Clusters: Constraint Programming vs. Backfilling Approaches.

[BibT_eX]

[DOI]

Alexander V. Goponenko

,

,

Benjamin A. Allan

,

James M. Brandt

,

Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems, 2024

Evolving Large Scale HPC Monitoring & Analysis to Track Modern Dynamic Environments.

[BibT_eX]

[DOI]

,

,

Benjamin Schwaller

,

Thomas W. Tucker

Proceedings of the IEEE International Conference on Cluster Computing, 2024

2023

Driving HPC Operations With Holistic Monitoring and Operational Data Analytics (Dagstuhl Seminar 23171).

[BibT_eX]

[DOI]

,

Florina M. Ciorba

,

,

,

Dagstuhl Reports, 2023

Prodigy: Towards Unsupervised Anomaly Detection in Production HPC Systems.

[BibT_eX]

[DOI]

,

,

Benjamin Schwaller

,

,

,

,

,

,

Proceedings of the International Conference for High Performance Computing, 2023

Evaluating HPC Job Run Time Predictions Using Application Input Parameters.

[BibT_eX]

[DOI]

,

Alexander V. Goponenko

,

,

Benjamin A. Allan

,

James M. Brandt

,

Proceedings of the 17th ACM International Conference on Distributed and Event-based Systems, 2023

Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2023

2022

Metrics for Packing Efficiency and Fairness of HPC Cluster Batch Job Scheduling.

[BibT_eX]

[DOI]

Alexander V. Goponenko

,

,

Christina L. Peterson

,

Benjamin A. Allan

,

,

Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

ALBADross: Active Learning Based Anomaly Diagnosis for Production HPC Systems.

[BibT_eX]

[DOI]

,

,

Benjamin Schwaller

,

,

,

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021

Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems.

[BibT_eX]

[DOI]

,

,

,

Benjamin Schwaller

,

,

,

,

,

Proceedings of the High Performance Computing - 36th International Conference, 2021

Systematically inferring I/O performance variability by examining repetitive job behavior.

[BibT_eX]

[DOI]

,

,

Benjamin Schwaller

,

,

Proceedings of the International Conference for High Performance Computing, 2021

Delay sensitivity-driven congestion mitigation for HPC systems.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Zbigniew Kalbarczyk

,

Ravishankar K. Iyer

Proceedings of the ICS '21: 2021 International Conference on Supercomputing, 2021

Using Monitoring Data to Improve HPC Performance via Network-Data-Driven Allocation.

[BibT_eX]

[DOI]

,

,

,

Benjamin Schwaller

,

,

,

,

Proceedings of the 2021 IEEE High Performance Extreme Computing Conference, 2021

E2EWatch: An End-to-End Anomaly Diagnosis Framework for Production HPC Systems.

[BibT_eX]

[DOI]

,

Benjamin Schwaller

,

,

,

,

,

Proceedings of the Euro-Par 2021: Parallel Processing, 2021

Backfilling HPC Jobs with a Multimodal-Aware Predictor.

[BibT_eX]

[DOI]

,

Alexander V. Goponenko

,

Christina L. Peterson

,

Benjamin A. Allan

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2021

2020

Application-aware Congestion Mitigation forHigh-Performance Computing Systems.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Zbigniew Kalbarczyk

,

Ravishankar K. Iyer

CoRR, 2020

ALAMO: Autonomous Lightweight Allocation, Management, and Optimization.

[BibT_eX]

[DOI]

,

Kurt B. Ferreira

,

,

,

Jay F. Lofstead

,

Stephen L. Olivier

,

Kevin T. Pedretti

,

Andrew J. Younge

,

,

Proceedings of the Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI, 2020

Measuring Congestion in High-Performance Datacenter Interconnects.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Zbigniew Kalbarczyk

,

,

Proceedings of the 17th USENIX Symposium on Networked Systems Design and Implementation, 2020

HPC System Data Pipeline to Enable Meaningful Insights through Analysis-Driven Visualizations.

[BibT_eX]

[DOI]

Benjamin Schwaller

,

,

,

Benjamin A. Allan

,

Proceedings of the IEEE International Conference on Cluster Computing, 2020

Towards workload-adaptive scheduling for HPC clusters.

[BibT_eX]

[DOI]

Alexander V. Goponenko

,

Ramin Izadpanah

,

,

Proceedings of the IEEE International Conference on Cluster Computing, 2020

2019

Online Diagnosis of Performance Variation in HPC Systems Using Machine Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

IEEE Trans. Parallel Distributed Syst., 2019

Production Application Performance Data Streaming for System Monitoring.

[BibT_eX]

[DOI]

Ramin Izadpanah

,

Benjamin A. Allan

,

,

ACM Trans. Model. Perform. Evaluation Comput. Syst., 2019

Understanding Fault Scenarios and Impacts through Fault Injection Experiments in Cielo.

[BibT_eX]

[DOI]

Valerio Formicola

,

,

,

,

,

,

,

,

,

,

,

,

Annette Greiner

,

Zbigniew Kalbarczyk

,

Ravishankar K. Iyer

,

CoRR, 2019

HPAS: An HPC Performance Anomaly Suite for Reproducing Performance Variations.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 48th International Conference on Parallel Processing, 2019

A Study of Network Congestion in Two Supercomputing High-Speed Interconnects.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Zbigniew T. Kalbarczyk

,

,

Ravishankar K. Iyer

Proceedings of the 2019 IEEE Symposium on High-Performance Interconnects, 2019

2018

An Efficient Latch-free Database Index Based on Multi-dimensional Lists.

[BibT_eX]

[DOI]

,

Ramin Izadpanah

,

,

Proceedings of the 37th IEEE International Performance Computing and Communications Conference, 2018

Integrating Low-latency Analysis into HPC System Monitoring.

[BibT_eX]

[DOI]

Ramin Izadpanah

,

Nichamon Naksinehaboon

,

,

,

Proceedings of the 47th International Conference on Parallel Processing, 2018

Taxonomist: Application Detection Through Rich Monitoring Data.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the Euro-Par 2018: Parallel Processing, 2018

Characterizing Supercomputer Traffic Networks Through Link-Level Analysis.

[BibT_eX]

[DOI]

,

,

,

Zbigniew Kalbarczyk

,

Ravishankar K. Iyer

Proceedings of the IEEE International Conference on Cluster Computing, 2018

Large-Scale System Monitoring Experiences and Recommendations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

Diagnosing Performance Variations in HPC Applications Using Machine Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the High Performance Computing - 32nd International Conference, 2017

Holistic Measurement-Driven System Assessment.

[BibT_eX]

[DOI]

,

,

,

Zbigniew Kalbarczyk

,

Gregory H. Bauer

,

,

Michael T. Showerman

,

,

,

Annette Greiner

,

,

,

Ravishankar K. Iyer

,

Proceedings of the 2017 IEEE International Conference on Cluster Computing, 2017

2016

Continuous whole-system monitoring toward rapid understanding of production HPC applications and systems.

[BibT_eX]

[DOI]

Anthony M. Agelastos

,

Benjamin A. Allan

,

,

,

Sophia Lefantzi

,

,

,

,

Parallel Comput., 2016

Design and Implementation of a Scalable HPC Monitoring System.

[BibT_eX]

[DOI]

,

,

Graham van Heule

,

,

,

,

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

Large-Scale Persistent Numerical Data Source Monitoring System Experiences.

[BibT_eX]

[DOI]

,

,

Michael T. Showerman

,

,

,

Gregory H. Bauer

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

HPCMASPA Introduction and Committees.

[BibT_eX]

[DOI]

Benjamin A. Allan

,

,

,

Cory Lueninghoener

,

Nichamon Naksinehaboon

,

,

Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, 2016

2015

Infrastructure for In Situ System Monitoring and Application Data Analysis.

[BibT_eX]

[DOI]

,

Karen D. Devine

,

Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, 2015

Extending LDMS to Enable Performance Monitoring in Multi-core Applications.

[BibT_eX]

[DOI]

Steven D. Feldman

,

,

,

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

Toward Rapid Understanding of Production HPC Applications and Systems.

[BibT_eX]

[DOI]

Anthony M. Agelastos

,

Benjamin A. Allan

,

,

,

Sophia Lefantzi

,

,

,

,

Proceedings of the 2015 IEEE International Conference on Cluster Computing, 2015

2014

The Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications.

[BibT_eX]

[DOI]

Anthony M. Agelastos

,

Benjamin A. Allan

,

,

,

,

,

,

,

Nichamon Naksinehaboon

,

,

,

Michael T. Showerman

,

,

,

Thomas W. Tucker

Proceedings of the International Conference for High Performance Computing, 2014

Demonstrating improved application performance using dynamic monitoring and task mapping.

[BibT_eX]

[DOI]

,

Karen D. Devine

,

,

Kevin T. Pedretti

Proceedings of the 2014 IEEE International Conference on Cluster Computing, 2014

2012

Filtering log data: Finding the needles in the Haystack.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, 2012

2011

Baler: deterministic, lossless log message clustering tool.

[BibT_eX]

[DOI]

,

,

,

,

Chokchai Leangsuksun

Comput. Sci. Res. Dev., 2011

Framework for Enabling System Understanding.

[BibT_eX]

[DOI]

,

,

,

Chokchai Leangsuksun

,

Jackson R. Mayo

,

Philippe P. Pébay

,

,

,

David C. Thompson

,

Proceedings of the Euro-Par 2011: Parallel Processing Workshops - CCPI, CGWS, HeteroPar, HiBB, HPCVirt, HPPC, HPSS, MDGS, ProPer, Resilience, UCHPC, VHPC, Bordeaux, France, August 29, 2011

2010

Combining Virtualization, resource characterization, and Resource management to enable efficient high performance compute platforms through intelligent dynamic resource allocation.

[BibT_eX]

[DOI]

,

,

Vincent De Sapio

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

,

David C. Thompson

,

Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, 2010

Quantifying effectiveness of failure prediction and response in HPC systems: Methodology and example.

[BibT_eX]

[DOI]

,

,

Vincent De Sapio

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

,

David C. Thompson

,

Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W 2010), Chicago, Illinois, USA, June 28, 2010

Using Cloud Constructs and Predictive Analysis to Enable Pre-Failure Process Migration in HPC Systems.

[BibT_eX]

[DOI]

,

,

Vincent De Sapio

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

,

David C. Thompson

,

Proceedings of the 10th IEEE/ACM International Conference on Cluster, 2010

2009

Resource monitoring and management with OVIS to enable HPC in cloud computing environments.

[BibT_eX]

[DOI]

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

,

David C. Thompson

,

Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009

2008

Ovis-2: A robust distributed architecture for scalable RAS.

[BibT_eX]

[DOI]

,

Bert J. Debusschere

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

David C. Thompson

,

Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008

Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems.

[BibT_eX]

[DOI]

,

Bert J. Debusschere

,

,

Jackson R. Mayo

,

Philippe P. Pébay

,

David C. Thompson

,

Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), 2008

2006

OVIS: a tool for intelligent, real-time monitoring of computational clusters.

[BibT_eX]

[DOI]

,

,

,

Philippe P. Pébay

Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), 2006

2005

Meaningful Automated Statistical Analysis of Large Computational Clusters.

[BibT_eX]

[DOI]

,

,

Youssef M. Marzouk

,

Philippe P. Pébay

Proceedings of the 2005 IEEE International Conference on Cluster Computing (CLUSTER 2005), September 26, 2005

Loading...