PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proc. VLDB Endow., 2023
PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
Efficient Language Modeling with Sparse all-MLP.
CoRR, 2022
8-bit Optimizers via Block-wise Quantization.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Few-shot Learning with Multilingual Generative Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Efficient Large Scale Language Modeling with Mixtures of Experts.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Automated coronary calcium scoring using deep learning with multicenter external validation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
npj Digit. Medicine, 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization.
CoRR, 2021
Pre-trained Summarization Distillation.
CoRR, 2020
Transformers: State-of-the-Art Natural Language Processing.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020
Incrementally Improving Graph WaveNet Performance on Traffic Prediction.
CoRR, 2019
Classification as Decoder: Trading Flexibility for Control in Medical Dialogue.
CoRR, 2019
Classification As Decoder: Trading Flexibility For Control In Neural Dialogue.
CoRR, 2019
Using Small Proxy Datasets to Accelerate Hyperparameter Search.
CoRR, 2019
Low Resource Text Classification with ULMFit and Backtranslation.
CoRR, 2019