Khaled Z. Ibrahim

According to our database1, Khaled Z. Ibrahim authored at least 62 papers between 2001 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
VAN-DAMME: GPU-accelerated and symmetry-assisted quantum optimal control of multi-qubit systems.
Comput. Phys. Commun., 2025

2024
QRCODE: Massively parallelized real-time time-dependent density functional theory for periodic systems.
Comput. Phys. Commun., 2024

TRAVOLTA: GPU acceleration and algorithmic improvements for constructing quantum optimal control fields in photo-excited systems.
Comput. Phys. Commun., 2024

Scalable Training of Graph Foundation Models for Atomistic Materials Modeling: A Case Study with HydraGNN.
CoRR, 2024

An Evaluation of Real-time Adaptive Sampling Change Point Detection Algorithm using KCUSUM.
CoRR, 2024

A Systematic Study of Parallelization Strategies for Optimizing Scientific Computing Performance Bounds.
Proceedings of the 37th IEEE International System-on-Chip Conference, 2024

Cost-Effective Methodology for Complex Tuning Searches in HPC: Navigating Interdependencies and Dimensionality.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

MDLoader: A Hybrid Model-driven Data Loader for Distributed Deep Neural Networks Training.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium, 2024

2023
Exploring temporal community evolution: algorithmic approaches and parallel optimization for dynamic community detection.
Appl. Netw. Sci., December, 2023

2022
Enhancing scalability of a matrix-free eigensolver for studying many-body localization.
Int. J. High Perform. Comput. Appl., 2022

ML-based Performance Portability for Time-Dependent Density Functional Theory in HPC Environments.
Proceedings of the IEEE/ACM International Workshop on Performance Modeling, 2022

Performance Portability of Sparse Block Diagonal Matrix Multiple Vector Multiplications on GPUs.
Proceedings of the IEEE/ACM International Workshop on Performance, 2022

Preprocessing Pipeline Optimization for Scientific Deep Learning Workloads.
Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium, 2022

2021
Architectural Requirements for Deep Learning Workloads in HPC Environments.
Proceedings of the 2021 International Workshop on Performance Modeling, 2021

Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures.
Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops, 2021

CSPACER: A Reduced API Set Runtime for the Space Consistency Model.
Proceedings of the HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, 2021

2020
Tuning floating-point precision using dynamic program information and temporal locality.
Proceedings of the International Conference for High Performance Computing, 2020

Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches.
Proceedings of the 2020 IEEE/ACM Performance Modeling, 2020

2019
Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers.
Int. J. High Perform. Comput. Appl., 2019

Performance analysis of deep learning workloads using roofline trajectories.
CCF Trans. High Perform. Comput., 2019

Toward a Programmable Analysis and Visualization Framework for Interactive Performance Analytics.
Proceedings of the IEEE/ACM International Workshop on Programming and Performance Visualization Tools, 2019

Optimizing Breadth-First Search at Scale Using Hardware-Accelerated Space Consistency.
Proceedings of the 26th IEEE International Conference on High Performance Computing, 2019

Performance Analysis of GPU Programming Models Using the Roofline Scaling Trajectories.
Proceedings of the Benchmarking, Measuring, and Optimizing, 2019

2018
Roofline Scaling Trajectories: A Method for Parallel Application and Architectural Performance Analysis.
Proceedings of the 2018 International Conference on High Performance Computing & Simulation, 2018

2017
Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends.
J. Parallel Distributed Comput., 2017

Reaching bandwidth saturation using transparent injection parallelization.
Int. J. High Perform. Comput. Appl., 2017

APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks.
Proceedings of the 17th IEEE/ACM International Symposium on Cluster, 2017

2016
Scaling Spark on Lustre.
Proceedings of the High Performance Computing, 2016

Extreme scale plasma turbulence simulations on top supercomputers worldwide.
Proceedings of the International Conference for High Performance Computing, 2016

Characterizing the Performance of Hybrid Memory Cube Using ApexMAP Application Probes.
Proceedings of the Second International Symposium on Memory Systems, 2016

Scaling Spark on HPC Systems.
Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, 2016

2015
Exploiting communication concurrency on high performance computing systems.
Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores, 2015

2014
The Case for Partitioning Virtual Machines on Multicore Architectures.
IEEE Trans. Parallel Distributed Syst., 2014

Efficient Interoperability of OpenSHMEM on Multicore Architectures.
Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, 2014

An Evaluation of One-Sided and Two-Sided Communication Paradigms on Relaxed-Ordering Interconnect.
Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 2014

On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domains.
Proceedings of the 2014 International Conference on Supercomputing, 2014

Analysis and tuning of libtensor framework on multicore architectures.
Proceedings of the 21st International Conference on High Performance Computing, 2014

2013
Analysis and optimization of gyrokinetic toroidal simulations on homogenous and heterogenous platforms.
Int. J. High Perform. Comput. Appl., 2013

Kinetic turbulence simulations at extreme scale on leadership-class systems.
Proceedings of the International Conference for High Performance Computing, 2013

2012
Code Development of High-Performance Applications for Power-Efficient Architectures.
Proceedings of the Handbook of Energy-Aware and Green Computing - Two Volume Set., 2012

Poster: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Abstract: Advances in Gyrokinetic Particle in Cell Simulation for Fusion Plasmas to Extreme Scale.
Proceedings of the 2012 SC Companion: High Performance Computing, 2012

Congestion avoidance on manycore high performance computing systems.
Proceedings of the International Conference on Supercomputing, 2012

Concurrent Phase Classification for Accelerating MPSoC Simulation.
Proceedings of the ARCS 2012 Workshops, 28. Februar - 2. März 2012, München, Germany, 2012

2011
Gyrokinetic particle-in-cell optimization on emerging multi- and manycore platforms.
Parallel Comput., 2011

Gyrokinetic toroidal simulations on leading multi- and manycore HPC systems.
Proceedings of the Conference on High Performance Computing Networking, 2011

Optimized pre-copy live migration for memory intensive applications.
Proceedings of the Conference on High Performance Computing Networking, 2011

Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines.
Proceedings of the 11th IEEE/ACM International Symposium on Cluster, 2011

2010
Parallel application sampling for accelerating MPSoC simulation.
Des. Autom. Embed. Syst., 2010

Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions.
Proceedings of the 39th International Conference on Parallel Processing, 2010

Bridging the gap between complex software paradigms and power-efficient parallel architectures.
Proceedings of the International Green Computing Conference 2010, 2010

2009
Power-Aware Bus Coscheduling for Periodic Realtime Applications Running on Multiprocessor SoC.
Trans. High Perform. Embed. Archit. Compil., 2009

Efficient SIMDization and data management of the Lattice QCD computation on the Cell Broadband Engine.
Sci. Program., 2009

2008
Fine-grained parallelization of lattice QCD kernel routine on GPUs.
J. Parallel Distributed Comput., 2008

Implementing Wilson-Dirac operator on the cell broadband engine.
Proceedings of the 22nd Annual International Conference on Supercomputing, 2008

Multi-granularity sampling for simulating concurrent heterogeneous applications.
Proceedings of the 2008 International Conference on Compilers, 2008

2007
Adaptive Sampling for Efficient MPSoC Architecture Simulation.
Proceedings of the 15th International Symposium on Modeling, 2007

2005
Correlation between Detailed and Simplified Simulations in Studying Multiprocessor Architecture.
Proceedings of the 23rd International Conference on Computer Design (ICCD 2005), 2005

Efficient Architectural Support for Secure Bus-Based Shared Memory Multiprocessor.
Proceedings of the Advances in Computer Systems Architecture, 10th Asia-Pacific Conference, 2005

2003
Extending OpenMP to Support Slipstream Execution Mode.
Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS 2003), 2003

Slipstream Execution Mode for CMP-Based Multiprocessors.
Proceedings of the Ninth International Symposium on High-Performance Computer Architecture (HPCA'03), 2003

2001
On the Exploitation of Value Predication and Producer Identification to Reduce Barrier Synchronization Time.
Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS-01), 2001


  Loading...