Xiaonan Nie

Orcid: 0000-0001-6766-757X

According to our database1, Xiaonan Nie authored at least 20 papers between 2021 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment.
CoRR, 2024

DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning.
CoRR, 2024

BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline.
CoRR, 2024

PQCache: Product Quantization-based KVCache for Long Context LLM Inference.
CoRR, 2024

Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs.
CoRR, 2024

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge.
CoRR, 2024

Enabling Parallelism Hot Switching for Efficient Training of Large Language Models.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024

2023
Hetu: a highly efficient automatic parallel distributed deep learning system.
Sci. China Inf. Sci., January, 2023

Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent.
Proc. VLDB Endow., 2023

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.
Proc. ACM Manag. Data, 2023

Improving Automatic Parallel Training via Balanced Memory Workload Optimization.
CoRR, 2023

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism.
Proc. VLDB Endow., 2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
CoRR, 2022

HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System.
CoRR, 2022

HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

2021
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework.
Proc. VLDB Endow., 2021

Dense-to-Sparse Gate for Mixture-of-Experts.
CoRR, 2021

Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021


  Loading...