Weihao Cui

Orcid: 0000-0002-6646-5260

According to our database1, Weihao Cui authored at least 22 papers between 2019 and 2025.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
FLAPS: fluctuation-aware power auction strategy for reducing the power overload probability.
Frontiers Comput. Sci., May, 2025

2024
Accelerating Sparse DNNs Based on Tiled GEMM.
IEEE Trans. Computers, May, 2024

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.
CoRR, 2024

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.
CoRR, 2024

Towards Fast Setup and High Throughput of GPU Serverless Computing.
CoRR, 2024

A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters.
CoRR, 2024

2023
Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation.
IEEE Trans. Computers, December, 2023

ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management.
IEEE Trans. Computers, May, 2023

Optimizing Dynamic Neural Networks with Brainstorm.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Microless: Cost-Efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo.
Proceedings of the 2023 ACM Symposium on Cloud Computing, SoCC 2023, 2023

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022
Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs.
IEEE Trans. Computers, 2022

DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021
E<sup>2</sup>bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.
IEEE Trans. Parallel Distributed Syst., 2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.
Proceedings of the International Conference for High Performance Computing, 2021

Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs.
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.
Proceedings of the ACM International Conference on Supercomputing, 2019

Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019


  Loading...