Fengguang Song

Orcid: 0000-0001-7382-093X

According to our database1, Fengguang Song authored at least 50 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 


On csauthors.net:


WIPE: A Write-Optimized Learned Index for Persistent Memory.
ACM Trans. Archit. Code Optim., June, 2024

Asynchronous modeling workflows in CyberWater with on-demand HPC/Cloud access.
Future Gener. Comput. Syst., 2024

Efficient in-situ workflow planning for geographically distributed heterogeneous environments.
Future Gener. Comput. Syst., December, 2023

A Distributed-GPU Deep Reinforcement Learning System for Solving Large Graph Optimization Problems.
ACM Trans. Parallel Comput., June, 2023

INSTANT: A Runtime Framework to Orchestrate In-Situ Workflows.
Proceedings of the Euro-Par 2023: Parallel Processing - 29th International Conference on Parallel and Distributed Computing, Limassol, Cyprus, August 28, 2023

CyberWater: An Open Framework for Data and Model Integration in Water Science and Engineering.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

Designing a parallel Feel-the-Way clustering algorithm on HPC systems.
Int. J. High Perform. Comput. Appl., 2021

OpenGraphGym-MG: Using Reinforcement Learning to Solve Large Graph Optimization Problems on MultiGPU Systems.
CoRR, 2021

X-composer: enabling cross-environments in-situ workflows between HPC and cloud.
Proceedings of the PASC '21: Platform for Advanced Scientific Computing Conference, 2021

Designing a 3D Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems.
Proceedings of the Euro-Par 2021: Parallel Processing, 2021

Accelerating complex modeling workflows in CyberWater using on-demand HPC/Cloud resources.
Proceedings of the 17th IEEE International Conference on eScience, 2021

SDN helps Big Data to optimize access to data.
CoRR, 2020

ElasticBroker: Combining HPC with Cloud to Provide Realtime Insights into Simulations.
CoRR, 2020

Utilizing GPU Performance Counters to Characterize GPU Kernels via Machine Learning.
Proceedings of the Computational Science - ICCS 2020, 2020

OpenGraphGym: A Parallel Reinforcement Learning Framework for Graph Optimization Problems.
Proceedings of the Computational Science - ICCS 2020, 2020

Building a scientific workflow framework to enable real-time machine learning and visualization.
Concurr. Comput. Pract. Exp., 2019

FQL: An Extensible Feature Query Language and Toolkit on Searching Software Characteristics for HPC Applications.
Proceedings of the Tools and Techniques for High Performance Computing, 2019

XScan: An Integrated Tool for Understanding Open Source Community-Based Scientific Code.
Proceedings of the Computational Science - ICCS 2019, 2019

An Extended Roofline Model with Communication-Awareness for Distributed-Memory HPC Systems.
Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, 2019

Userland CO-PAGER: boosting data-intensive applications with non-volatile memory, userspace paging.
Proceedings of the 3rd International Conference on High Performance Compilation, 2019

Interactive 3D simulation for fluid-structure interactions using dual coupled GPUs.
J. Supercomput., 2018

Scaling Up Parallel Computation of Tiled QR Factorizations by a Distributed Scheduling Runtime System and Analytical Modeling.
Parallel Process. Lett., 2018

On A Simpler and Faster Derivation of Single Use Reliability Mean and Variance for Model-Based Statistical Testing (S).
Proceedings of the 30th International Conference on Software Engineering and Knowledge Engineering, 2018

Designing a Parallel Memory-Aware Lattice Boltzmann Algorithm on Manycore Systems.
Proceedings of the 30th International Symposium on Computer Architecture and High Performance Computing, 2018

Performance analysis and optimization of in-situ integration of simulation with data analysis: zipping applications up.
Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, 2018

A Real-Time Machine Learning and Visualization Framework for Scientific Workflows.
Proceedings of the Practice and Experience in Advanced Research Computing 2017: Sustainability, 2017

A Simpler and More Direct Derivation of System Reliability Using Markov Chain Usage Models.
Proceedings of the 29th International Conference on Software Engineering and Knowledge Engineering, 2017

Designing a Synchronization-reducing Clustering Method on Manycores: Some Issues and Improvements.
Proceedings of the Machine Learning on HPC Environments, 2017

Correcting soft errors online in fast fourier transform.
Proceedings of the International Conference for High Performance Computing, 2017

OptiMatch: Enabling an Optimal Match between Green Power and Various Workloads for Renewable-Energy Powered Storage Systems.
Proceedings of the 46th International Conference on Parallel Processing, 2017

An Algorithm for Forward Reduction in Sequence-Based Software Specification.
Int. J. Softw. Eng. Knowl. Eng., 2016

suCAQR: A Simplified Communication-Avoiding QR Factorization Solver Using the TBLAS Framework.
Proceedings of the 22nd IEEE International Conference on Parallel and Distributed Systems, 2016

Modeling and Implementation of an Asynchronous Approach to Integrating HPC and Big Data Analysis.
Proceedings of the International Conference on Computational Science 2016, 2016

A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems.
Concurr. Comput. Pract. Exp., 2015

Quality Assurance through Rigorous Software Specification and Testing: A Case Study.
Proceedings of the 2015 International Conference on Soft Computing and Software Engineering, 2015

LBM-IB: A Parallel Library to Solve 3D Fluid-Structure Interaction Problems on Manycore Systems.
Proceedings of the 44th International Conference on Parallel Processing, 2015

Scaling up matrix computations on shared-memory manycore systems with 1000 CPU cores.
Proceedings of the 2014 International Conference on Supercomputing, 2014

KV-Cache: A Scalable High-Performance Web-Object Cache for Manycore.
Proceedings of the IEEE/ACM 6th International Conference on Utility and Cloud Computing, 2013

Implementing a high-performance recommendation system using Phoenix++.
Proceedings of the 8th International Conference for Internet Technology and Secured Transactions, 2013

A scalable framework for heterogeneous GPU-based clusters.
Proceedings of the 24th ACM Symposium on Parallelism in Algorithms and Architectures, 2012

Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems.
Proceedings of the International Conference on Supercomputing, 2012

Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems.
Proceedings of the Conference on High Performance Computing Networking, 2010

Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems.
Proceedings of the ACM/IEEE Conference on High Performance Computing, 2009

A Scalable Non-blocking Multicast Scheme for Distributed DAG Scheduling.
Proceedings of the Computational Science, 2009

Analytical modeling and optimization for affinity based thread scheduling on multicore systems.
Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31, 2009

L2 Cache Modeling for Scientific Applications on Chip Multi-Processors.
Proceedings of the 2007 International Conference on Parallel Processing (ICPP 2007), 2007

Feedback-directed thread scheduling with memory considerations.
Proceedings of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16 2007), 2007

Performance Instrumentation and Compiler Optimizations for MPI/OpenMP Applications.
Proceedings of the OpenMP Shared Memory Parallel Programming - International Workshops, 2006

Automatic Experimental Analysis of Communication Patterns in Virtual Topologies.
Proceedings of the 34th International Conference on Parallel Processing (ICPP 2005), 2005

An Algebra for Cross-Experiment Performance Analysis.
Proceedings of the 33rd International Conference on Parallel Processing (ICPP 2004), 2004
