Lingxiao Ma

Orcid: 0009-0009-9524-5476

According to our database¹, Lingxiao Ma authored at least 30 papers between 2005 and 2024.

Collaborative distances:

Dijkstra number² of three.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor.

[BibT_eX]

[DOI]

CoRR, 2024

T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge.

[BibT_eX]

[DOI]

CoRR, 2024

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits.

[BibT_eX]

[DOI]

CoRR, 2024

Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor with T10.

[BibT_eX]

[DOI]

Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor Cores.

[BibT_eX]

[DOI]

Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation.

[BibT_eX]

[DOI]

Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation, 2024

2023

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.

[BibT_eX]

[DOI]

Proc. ACM Manag. Data, 2023

BitNet: Scaling 1-bit Transformers for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation.

[BibT_eX]

[DOI]

CoRR, 2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation.

[BibT_eX]

[DOI]

Proceedings of the 29th Symposium on Operating Systems Principles, 2023

Cocktailer: Analyzing and Optimizing Dynamic Control Flow in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Optimizing Dynamic Neural Networks with Brainstorm.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Welder: Scheduling Deep Learning Memory Access via Tile-graph.

[BibT_eX]

[DOI]

Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation, 2023

Efficient GPU Kernels for N: M-Sparse Weights in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Sixth Conference on Machine Learning and Systems, 2023

2022

CuWide: Towards Efficient Flow-Based Training for Sparse Wide Models on GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Knowl. Data Eng., 2022

ROLLER: Fast and Efficient Tensor Compilation for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute.

[BibT_eX]

[DOI]

Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation, 2022

2021

Dense-to-Sparse Gate for Mixture-of-Experts.

[BibT_eX]

[DOI]

CoRR, 2021

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.

[BibT_eX]

[DOI]

Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs (Extended Abstract).

[BibT_eX]

[DOI]

Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Accelerating GNN training with locality-aware partial execution.

[BibT_eX]

[DOI]

Proceedings of the APSys '21: 12th ACM SIGOPS Asia-Pacific Workshop on Systems, 2021

2020

Architectural Implications of Graph Neural Networks.

[BibT_eX]

[DOI]

IEEE Comput. Archit. Lett., 2020

Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks.

[BibT_eX]

[DOI]

Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation, 2020

PCGCN: Partition-Centric Processing for Accelerating Graph Convolutional Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

2019

NeuGraph: Parallel Deep Neural Network Computation on Large Graphs.

[BibT_eX]

[DOI]

Proceedings of the 2019 USENIX Annual Technical Conference, 2019

SeerNet: Predicting Convolutional Neural Network Feature-Map Sparsity Through Low-Bit Quantization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018

Towards Efficient Large-Scale Graph Neural Network Computing.

[BibT_eX]

[DOI]

CoRR, 2018

2017

Garaph: Efficient GPU-accelerated Graph Processing on a Single Machine with Balanced Replication.

[BibT_eX]

[DOI]

Proceedings of the 2017 USENIX Annual Technical Conference, 2017

2005

CoopStreaming: A Novel Peer-to-Peer System for Fast Live Media Streaming.

[BibT_eX]

[DOI]

Proceedings of the Advances in Web-Age Information Management, 2005

Lingxiao Ma

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...