Training LLMs on HPC Systems: Best Practices from the OpenGPT-X Project.
CoRR, April, 2025
Memory and Bandwidth are All You Need for Fully Sharded Data Parallel.
CoRR, April, 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit.
CoRR, 2024
Tokenizer Choice For LLM Training: Negligible or Crucial?
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Toward the Production of Spatiotemporally Consistent Annual Land Cover Maps Using Sentinel-2 Time Series.
IEEE Geosci. Remote. Sens. Lett., 2023
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas.
CoRR, 2023
Enhancing Training Set Through Multi-Temporal Attention Analysis in Transformers for Multi-Year Land Cover Mapping.
Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, 2023
Hearts Gym: Learning Reinforcement Learning as a Team Event.
Proceedings of the Third Teaching Machine Learning and Artificial Intelligence Workshop, 2022
JUWELS Booster - A Supercomputer for Large-Scale AI Research.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the High Performance Computing - ISC High Performance Digital 2021 International Workshops, Frankfurt am Main, Germany, June 24, 2021