Lingxiao Ma

Orcid: 0009-0009-9524-5476

According to our database1, Lingxiao Ma authored at least 30 papers between 2005 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.
CoRR, 2024

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor.
CoRR, 2024

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.
CoRR, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.
CoRR, 2024

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.
Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.
Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.
Proc. ACM Manag. Data, 2023

BitNet: Scaling 1-bit Transformers for Large Language Models.
CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.
CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.
Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Optimizing Dynamic Neural Networks with Brainstorm.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Welder: Scheduling Deep Learning Memory Access via Tile-graph.
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.
Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

2022
CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUs.
IEEE Trans. Knowl. Data Eng., 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021
Dense-to-Sparse Gate for Mixture-of-Experts.
CoRR, 2021

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract).
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Accelerating GNN training with locality-aware partial execution.
Proceedings of the APSys '21: 12th ACM SIGOPS Asia-Pacific Workshop on Systems, 2021

2020
Architectural Implications of Graph Neural Networks.
IEEE Comput. Archit. Lett., 2020

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.
Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network.
Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019
NeuGraph: Parallel Deep Neural Network Computation on Large Graphs.
Proceedings of the 2019 USENIX Annual Technical Conference, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Towards Efficient Large-Scale Graph Neural Network Computing.
CoRR, 2018

2017
Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication.
Proceedings of the 2017 USENIX Annual Technical Conference, 2017

2005
CoopStreaming: A Novel Peer-to-Peer System for Fast Live Media Streaming.
Proceedings of the Advances in Web-Age Information Management, 2005


  Loading...