Mustafa Abdul Jabbar

Orcid: 0000-0003-1280-130X

According to our database1, Mustafa Abdul Jabbar authored at least 40 papers between 2008 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Accelerating communication with multi-HCA aware collectives in MPI.
Concurr. Comput. Pract. Exp., 2024

OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory Devices.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024

OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024

Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024

PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

Towards Accelerating k-NN with MPI and Near-Memory Processing.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

Demystifying the Communication Characteristics for Distributed Transformer Models.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024

2023
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries.
IEEE Micro, 2023

Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version.
CoRR, 2023

Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

MCR-DL: Mix-and-Match Communication Runtime for Deep Learning.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023

Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication.
Proceedings of the 37th International Conference on Supercomputing, 2023

Performance Characterization of Using Quantization for DNN Inference on Edge Devices.
Proceedings of the 7th IEEE International Conference on Fog and Edge Computing, 2023

Designing In-network Computing Aware Reduction Collectives in MPI.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023

Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023

Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023

2022
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes.
ACM Trans. Archit. Code Optim., 2022

STEER: Asymmetry-aware Energy Efficient Task Scheduler for Cluster-based Multicore Architectures.
Proceedings of the 2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2022

Shisha: Online Scheduling of CNN Pipelines on Heterogeneous Architectures.
Proceedings of the Parallel Processing and Applied Mathematics, 2022

Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022

Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022

Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI.
Proceedings of the IEEE International Conference on Cluster Computing, 2022

2021
Mitigating inefficient task mappings with an Adaptive Resource-Moldable Scheduler (ARMS).
CoRR, 2021

An online guided tuning approach to run CNN pipelines on edge devices.
Proceedings of the CF '21: Computing Frontiers Conference, 2021

2020
Abstraction Layer For Standardizing APIs of Task-Based Engines.
IEEE Trans. Parallel Distributed Syst., 2020

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments.
Proceedings of the ICPP Workshops '20: Workshops, Edmonton, AB, Canada, August 17-20, 2020, 2020


2019
Extreme Scale FMM-Accelerated Boundary Integral Equation Solver for Wave Scattering.
SIAM J. Sci. Comput., 2019

LEGaTO: Low-Energy, Secure, and Resilient Toolset for Heterogeneous Computing.
CoRR, 2019

An Adaptive Performance-oriented Scheduler for Static and Dynamic Heterogeneity.
CoRR, 2019

2017
Communication Reducing Algorithms for Distributed Hierarchical N-Body Problems with Boundary Distributions.
Proceedings of the High Performance Computing - 32nd International Conference, 2017

Performance Evaluation of Computation and Communication Kernels of the Fast Multipole Method on Intel Manycore Architecture.
Proceedings of the Euro-Par 2017: Parallel Processing - 23rd International Conference on Parallel and Distributed Computing, Santiago de Compostela, Spain, August 28, 2017

2015
Composing Algorithmic Skeletons to Express High-Performance Scientific Applications.
Proceedings of the 29th ACM on International Conference on Supercomputing, 2015

2014
Asynchronous Execution of the Fast Multipole Method Using Charm++.
CoRR, 2014

2009
InfoPods: Zigbee-based remote information monitoring devices for smart-homes.
IEEE Trans. Consumer Electron., 2009

2008
A Pervasive Assessment System: Extending QTI to Incorporate Ad-hoc Wireless Sensors.
Proceedings of the 8th IEEE International Conference on Advanced Learning Technologies, 2008


  Loading...