Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization.
CoRR, 2024
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment.
CoRR, 2024
DataSculpt: Crafting Data Landscapes for LLM Post-Training through Multi-objective Partitioning.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
PQCache: Product Quantization-based KVCache for Long Context LLM Inference.
CoRR, 2024
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge.
CoRR, 2024
Enabling Parallelism Hot Switching for Efficient Training of Large Language Models.
Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, 2024
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Hetu: a highly efficient automatic parallel distributed deep learning system.
Sci. China Inf. Sci., January, 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent.
Proc. VLDB Endow., 2023
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement.
Proc. ACM Manag. Data, 2023
Improving Automatic Parallel Training via Balanced Memory Workload Optimization.
CoRR, 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism.
Proc. VLDB Endow., 2022
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning.
CoRR, 2022
HetuMoE: An Efficient Trillion-scale Mixture-of-Expert Distributed Training System.
CoRR, 2022
HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model Training.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
TSPLIT: Fine-grained GPU Memory Management for Efficient DNN Training via Tensor Splitting.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022
HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework.
Proc. VLDB Endow., 2021
Dense-to-Sparse Gate for Mixture-of-Experts.
CoRR, 2021
Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021