Weihao Cui

Orcid: 0000-0002-6646-5260

According to our database¹, Weihao Cui authored at least 23 papers between 2019 and 2025.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

2019

2020

2021

2022

2023

2024

2025

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

FLAPS: fluctuation-aware power auction strategy for reducing the power overload probability.

[BibT_eX]

[DOI]

Frontiers Comput. Sci., May, 2025

Adaptive Kernel Fusion for Improving the GPU Utilization While Ensuring QoS.

[BibT_eX]

[DOI]

IEEE Trans. Computers, February, 2025

2024

Accelerating Sparse DNNs Based on Tiled GEMM.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2024

Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.

[BibT_eX]

[DOI]

CoRR, 2024

The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.

[BibT_eX]

[DOI]

CoRR, 2024

Towards Fast Setup and High Throughput of GPU Serverless Computing.

[BibT_eX]

[DOI]

CoRR, 2024

A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation.

[BibT_eX]

[DOI]

IEEE Trans. Computers, December, 2023

ISPA: Exploiting Intra-SM Parallelism in GPUs via Fine-Grained Resource Management.

[BibT_eX]

[DOI]

IEEE Trans. Computers, May, 2023

Optimizing Dynamic Neural Networks with Brainstorm.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Microless: Cost-Efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless.

[BibT_eX]

[DOI]

Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023

Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM Symposium on Cloud Computing, SoCC 2023, 2023

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023

2022

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2022

DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs.

[BibT_eX]

[DOI]

Proceedings of the 2022 USENIX Annual Technical Conference, 2022

PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.

[BibT_eX]

[DOI]

Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022

Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022

2021

E<sup>2</sup>bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2021

Exploiting Intra-SM Parallelism in GPUs via Persistent and Elastic Blocks.

[BibT_eX]

[DOI]

Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020

CODA: Improving Resource Utilization by Slimming and Co-locating DNN and CPU Jobs.

[BibT_eX]

[DOI]

Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020

2019

Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.

[BibT_eX]

[DOI]

Daniel Edward Mawhirter

Bo Wu

Chao Li

Minyi Guo

Proceedings of the ACM International Conference on Supercomputing, 2019

Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Computer Design, 2019

Weihao Cui

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...