Aamir Shafi
Orcid: 0000-0002-1924-2769
According to our database1,
Aamir Shafi
authored at least 80 papers
between 2003 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on orcid.org
On csauthors.net:
Bibliography
2024
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer.
CoRR, 2024
Concurr. Comput. Pract. Exp., 2024
Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning.
Proceedings of the Practice and Experience in Advanced Research Computing 2024: Human Powered Computing, 2024
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters.
Proceedings of the ISC High Performance 2024 Research Paper Proceedings (39th International Conference), 2024
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024
Proceedings of the 53rd International Conference on Parallel Processing, 2024
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2024
Proceedings of the 24th IEEE International Symposium on Cluster, 2024
2023
J. Comput. Sci. Technol., February, 2023
IEEE Micro, 2023
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version.
CoRR, 2023
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2023
Performance Characterization of Using Quantization for DNN Inference on Edge Devices.
Proceedings of the 7th IEEE International Conference on Fog and Edge Computing, 2023
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2023
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference.
Proceedings of the 30th IEEE International Conference on High Performance Computing, 2023
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences.
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
Proceedings of the 23rd IEEE/ACM International Symposium on Cluster, 2023
HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training.
Proceedings of the IEEE International Conference on Big Data, 2023
Proceedings of the IEEE International Conference on Big Data, 2023
2022
IEEE Micro, 2022
Proceedings of the PEARC '22: Practice and Experience in Advanced Research Computing, Boston, MA, USA, July 10, 2022
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022
Proceedings of the High Performance Computing - 37th International Conference, 2022
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters.
Proceedings of the High Performance Computing - 37th International Conference, 2022
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2022
Proceedings of the Workshop Proceedings of the 51st International Conference on Parallel Processing, 2022
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2022
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters.
Proceedings of the 29th IEEE International Conference on High Performance Computing, 2022
Proceedings of the IEEE/ACM International Workshop on Education for High Performance Computing, 2022
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI.
Proceedings of the IEEE International Conference on Cluster Computing, 2022
2021
Proceedings of the PEARC '21: Practice and Experience in Advanced Research Computing, 2021
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs.
Proceedings of the IEEE Symposium on High-Performance Interconnects, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 28th IEEE International Conference on High Performance Computing, 2021
Proceedings of the 21st IEEE/ACM International Symposium on Cluster, 2021
2020
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR.
Proceedings of the 6th IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments, 2020
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications.
Proceedings of the 27th IEEE International Conference on High Performance Computing, 2020
2019
Student Outcomes Assessment Methodology for ABET Accreditation: A Case Study of Computer Science and Computer Information Systems Programs.
IEEE Access, 2019
2018
Parameter estimation of qualitative biological regulatory networks on high performance computing hardware.
BMC Syst. Biol., 2018
Performance Comparison of a Parallel Recommender Algorithm Across Three Hadoop-Based Frameworks.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018
2016
An efficient schedulability condition for non-preemptive real-time systems at common scheduling points.
J. Supercomput., 2016
Towards Scalable Java HPC with Hybrid and Native Communication Devices in MPJ Express.
Int. J. Parallel Program., 2016
2015
Proceedings of the IEEE Conference on Network Function Virtualization and Software Defined Networks, 2015
Proceedings of the International Conference on Computational Science, 2015
2014
CoRR, 2014
Proceedings of the Workshop on Education for High-Performance Computing, 2014
Proceedings of the International Conference on Computational Science, 2014
Proceedings of the International Conference on Computational Science, 2014
2013
Proceedings of IEEE International Conference on Communications, 2013
Proceedings of the 13th IEEE/ACM International Symposium on Cluster, 2013
2012
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012
Proceedings of the 13th International Conference on Parallel and Distributed Computing, 2012
Proceedings of the IEEE 14th International Conference on e-Health Networking, 2012
2011
Collective Asynchronous Remote Invocation (CARI): A High-Level and Effcient Communication API for Irregular Applications.
Proceedings of the International Conference on Computational Science, 2011
Concurr. Comput. Pract. Exp., 2011
2010
Proceedings of the 8th International Conference on Principles and Practice of Programming in Java, 2010
2009
J. Parallel Distributed Comput., 2009
A comparative study of Java and C performance in two large-scale parallel applications.
Concurr. Comput. Pract. Exp., 2009
Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing, 2009
2008
A parallel implementation of the Finite-Domain Time-Difference algorithm using MPJ express.
Proceedings of the 22nd IEEE International Symposium on Parallel and Distributed Processing, 2008
2007
Scalable Comput. Pract. Exp., 2007
2006
Proceedings of the Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2006
Proceedings of the 5th International Symposium on Parallel and Distributed Computing (ISPDC 2006), 2006
Proceedings of the Computational Science, 2006
Proceedings of the 2006 IEEE International Conference on Cluster Computing, 2006
2005
IEEE Distributed Syst. Online, 2005
2003