Ningxin Zheng

Orcid: 0009-0009-6449-8972

According to our database1, Ningxin Zheng authored at least 27 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Online Streaming Video Super-Resolution With Convolutional Look-Up Table.
IEEE Trans. Image Process., 2024

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference.
CoRR, 2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.
CoRR, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023
Online Video Super-Resolution With Convolutional Kernel Bypass Grafts.
IEEE Trans. Multim., 2023

Online Video Streaming Super-Resolution with Adaptive Look-Up Table Fusion.
CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.
CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Optimizing Dynamic Neural Networks with Brainstorm.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs.
IEEE Trans. Computers, 2022

Online Video Super-Resolution with Convolutional Kernel Bypass Graft.
CoRR, 2022

QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services.
Proceedings of the SC22: International Conference for High Performance Computing, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services.
Proceedings of the ASPLOS '22: 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February 2022, 2022

2021
nn-METER: Towards Accurate Latency Prediction of DNN Inference on Diverse Edge Devices.
GetMobile Mob. Comput. Commun., 2021

Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision.
CoRR, 2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.
Proceedings of the International Conference for High Performance Computing, 2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices.
Proceedings of the MobiSys '21: The 19th Annual International Conference on Mobile Systems, Applications, and Services, Virtual Event, Wisconsin, USA, 24 June, 2021

CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters.
Proceedings of the 39th IEEE International Conference on Computer Design, 2021

2020
Towards QoS-Aware and Resource-Efficient GPU Microservices Based on Spatial Multitasking GPUs In Datacenters.
CoRR, 2020

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds.
Proceedings of the ICPP 2020: 49th International Conference on Parallel Processing, 2020

2019
URSA: Precise Capacity Planning and Contention-aware Scheduling for Public Clouds.
CoRR, 2019

POSTER: Precise Capacity Planning for Database Public Clouds.
Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques, 2019

2018
CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System.
Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, 2018


  Loading...