Weihao Cui
Orcid: 0000-0002-6646-5260
According to our database1,
Weihao Cui
authored at least 22 papers
between 2019 and 2025.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2025
FLAPS: fluctuation-aware power auction strategy for reducing the power overload probability.
Frontiers Comput. Sci., May, 2025
2024
Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization.
CoRR, 2024
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving.
CoRR, 2024
A Codesign of Scheduling and Parallelization for Large Model Training in Heterogeneous Clusters.
CoRR, 2024
2023
Improving Cluster Utilization Through Adaptive Resource Management for Deep Neural Network and CPU Jobs Colocation.
IEEE Trans. Computers, December, 2023
IEEE Trans. Computers, May, 2023
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023
Microless: Cost-Efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless.
Proceedings of the 29th IEEE International Conference on Parallel and Distributed Systems, 2023
Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo.
Proceedings of the 2023 ACM Symposium on Cloud Computing, SoCC 2023, 2023
Proceedings of the 20th ACM International Conference on Computing Frontiers, 2023
2022
IEEE Trans. Computers, 2022
DVABatch: Diversity-aware Multi-Entry Multi-Exit Batching for Efficient Processing of DNN Services on GPUs.
Proceedings of the 2022 USENIX Annual Technical Conference, 2022
PAME: precision-aware multi-exit DNN serving for reducing latencies of batched inferences.
Proceedings of the ICS '22: 2022 International Conference on Supercomputing, Virtual Event, June 28, 2022
Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS.
Proceedings of the IEEE International Symposium on High-Performance Computer Architecture, 2022
2021
E<sup>2</sup>bird: Enhanced Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.
IEEE Trans. Parallel Distributed Syst., 2021
Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.
Proceedings of the International Conference for High Performance Computing, 2021
Proceedings of the 39th IEEE International Conference on Computer Design, 2021
2020
Proceedings of the 40th IEEE International Conference on Distributed Computing Systems, 2020
2019
Laius: Towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters.
Proceedings of the ACM International Conference on Supercomputing, 2019
Ebird: Elastic Batch for Improving Responsiveness and Throughput of Deep Learning Services.
Proceedings of the 37th IEEE International Conference on Computer Design, 2019