MEMO: Fine-grained Tensor Management For Ultra-long Context LLM Training.
,
,
,
,
,
,
,
,
,
,
,
Proc. ACM Manag. Data, February, 2025
BeamVQ: Beam Search with Vector Quantization to Mitigate Data Scarcity in Physical Spatiotemporal Forecasting.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
Scaling Laws for Floating Point Quantization Training.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics.
CoRR, 2024
PURE: Prompt Evolution with Graph ODE for Out-of-distribution Fluid Dynamics Modeling.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent.
Proc. VLDB Endow., 2023
A Multi-Semantic Metapath Model for Large Scale Heterogeneous Network Representation Learning.
CoRR, 2020