Murali Emani

Orcid: 0000-0002-6279-0007

Affiliations:
  • Argonne National Laboratory, IL, USA


According to our database1, Murali Emani authored at least 52 papers between 2013 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Thorough Characterization and Analysis of Large Transformer Model Training At-Scale.
Proc. ACM Meas. Anal. Comput. Syst., 2024

AI-coupled HPC Workflow Applications, Middleware and Performance.
CoRR, 2024

Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor.
Proceedings of the 2024 USENIX Annual Technical Conference, 2024

Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

WActiGrad: Structured Pruning for Efficient Finetuning and Inference of Large Language Models on AI Accelerators.
Proceedings of the Euro-Par 2024: Parallel Processing, 2024

A Multi-Level, Multi-Scale Visual Analytics Approach to Assessment of Multifidelity HPC Systems.
Proceedings of the 24th IEEE International Symposium on Cluster, 2024

2023
A survey of techniques for optimizing transformer inference.
J. Syst. Archit., November, 2023

GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics.
Int. J. High Perform. Comput. Appl., November, 2023

DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies.
CoRR, 2023

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators.
CoRR, 2023

Neural Architecture Search Benchmarks: Insights and Survey.
IEEE Access, 2023

Differentiable Neural Architecture, Mixed Precision and Accelerator Co-Search.
IEEE Access, 2023

HPC-GPT: Integrating Large Language Model for High-Performance Computing.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Data Race Detection Using Large Language Models.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Characterizing the Performance of Triangle Counting on Graphcore's IPU Architecture.
Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, 2023

Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation.
Proceedings of the 2nd International Workshop on Extreme Heterogeneity Solutions, 2023

LM4HPC: Towards Effective Language Model Application in High-Performance Computing.
Proceedings of the OpenMP: Advanced Task-Based, Device and Compiler Programming, 2023

TrainBF: High-Performance DNN Training Engine Using BFloat16 on AI Accelerators.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

2022
Intelligent resolution: Integrating Cryo-EM with AI-driven multi-resolution simulations to observe the severe acute respiratory syndrome coronavirus-2 replication-transcription machinery in action.
Int. J. High Perform. Comput. Appl., 2022

Towards Seamless Management of AI Models in High-Performance Computing.
CoRR, 2022

FAIR for AI: An interdisciplinary, international, inclusive, and diverse community building perspective.
CoRR, 2022

XUnified: A Framework for Guiding Optimal Use of GPU Unified Memory.
IEEE Access, 2022

Neural Architecture Search for Transformers: A Survey.
IEEE Access, 2022

Making Machine Learning Datasets and Models FAIR for HPC: A Methodology and Case Study.
Proceedings of the Fourth International Conference on Transdisciplinary AI, 2022

AI Benchmarking for Science: Efforts from the MLCommons Science Working Group.
Proceedings of the High Performance Computing. ISC High Performance 2022 International Workshops - Hamburg, Germany, May 29, 2022


Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022

Interactive NLU-Powered Ontology-Based Workflow Synthesis for FAIR Support of HPC.
Proceedings of the IEEE/ACM International Workshop on HPC User Support Tools, 2022

Efficient Design Space Exploration for Sparse Mixed Precision Neural Architectures.
Proceedings of the HPDC '22: The 31st International Symposium on High-Performance Parallel and Distributed Computing, Minneapolis, MN, USA, 27 June 2022, 2022

Finding Reusable Machine Learning Components to Build Programming Language Processing Pipelines.
Proceedings of the Software Architecture. ECSA 2022 Tracks and Workshops, 2022

Early Experience with Transformer-Based Similarity Analysis for DataRaceBench.
Proceedings of the Sixth IEEE/ACM International Workshop on Software Correctness for HPC Applications, 2022

Towards neural architecture-aware exploration of compiler optimizations in a deep learning {graph} compiler.
Proceedings of the CF '22: 19th ACM International Conference on Computing Frontiers, Turin, Italy, May 17, 2022

Toward an In-Depth Analysis of Multifidelity High Performance Computing Systems.
Proceedings of the 22nd IEEE International Symposium on Cluster, 2022

2021
Accelerating Scientific Applications With SambaNova Reconfigurable Dataflow Architecture.
Comput. Sci. Eng., 2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems.
CoRR, 2021

Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms.
Proceedings of the PASC '21: Platform for Advanced Scientific Computing Conference, 2021

HPCFAIR: Enabling FAIR AI for HPC Applications.
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021

HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing.
Proceedings of the IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2021


2020
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications.
Concurr. Comput. Pract. Exp., 2020

2019
Machine Learning Guided Optimal Use of GPU Unified Memory.
Proceedings of the 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, 2019

MELA: A Visual Analytics Tool for Studying Multifidelity HPC System Logs.
Proceedings of the 3rd IEEE/ACM Industry/University Joint International Workshop on Data-center Automation, 2019

2018
Data Placement Optimization in GPU Memory Hierarchy using Predictive Modeling.
Proceedings of the Workshop on Memory Centric High Performance Computing, 2018

Is Data Placement Optimization Still Relevant on Newer GPUs?
Proceedings of the 2018 IEEE/ACM Performance Modeling, 2018

MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications.
Proceedings of the 25th European MPI Users' Group Meeting, 2018

Bootstrapping Parameter Space Exploration for Fast Tuning.
Proceedings of the 32nd International Conference on Supercomputing, 2018

2016
Mapping Medley: Adaptive Parallelism Mapping with Varying Optimization Goals.
Proceedings of the Languages and Compilers for Parallel Computing, 2016

Integrating Algorithmic Parameters into Benchmarking and Design Space Exploration in 3D Scene Understanding.
Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, 2016

2015
Adaptive parallelism mapping in dynamic environments using machine learning.
PhD thesis, 2015

Celebrating diversity: a mixture of experts approach for runtime mapping in dynamic environments.
Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2015

2014
Change Detection Based Parallelism Mapping: Exploiting Offline Models and Online Adaptation.
Proceedings of the Languages and Compilers for Parallel Computing, 2014

2013
Smart, adaptive mapping of parallelism in the presence of external workload.
Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization, 2013


  Loading...