William Fedus

According to our database1, William Fedus authored at least 29 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Scaling Instruction-Finetuned Language Models.
J. Mach. Learn. Res., 2024

Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

PaLM: Scaling Language Modeling with Pathways.
J. Mach. Learn. Res., 2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts.
CoRR, 2023

Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

2022
Emergent Abilities of Large Language Models.
Trans. Mach. Learn. Res., 2022

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.
J. Mach. Learn. Res., 2022

Scaling Instruction-Finetuned Language Models.
CoRR, 2022

A Review of Sparse Expert Models in Deep Learning.
CoRR, 2022

Designing Effective Sparse Expert Models.
CoRR, 2022


Scale Efficiently: Insights from Pretraining and Finetuning Transformers.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers.
CoRR, 2021

Revisiting ResNets: Improved Training and Scaling Strategies.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
On Catastrophic Interference in Atari 2600 Games.
CoRR, 2020

Revisiting Fundamentals of Experience Replay.
Proceedings of the 37th International Conference on Machine Learning, 2020

On Bonus Based Exploration Methods In The Arcade Learning Environment.
Proceedings of the 8th International Conference on Learning Representations, 2020

Language GANs Falling Short.
Proceedings of the 8th International Conference on Learning Representations, 2020

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment.
CoRR, 2019

Hyperbolic Discounting and Learning over Multiple Horizons.
CoRR, 2019

Deep Graph Infomax.
Proceedings of the 7th International Conference on Learning Representations, 2019

Recall Traces: Backtracking Models for Efficient Reinforcement Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Recall Traces: Backtracking Models for Efficient Reinforcement Learning.
CoRR, 2018

Disentangling the independently controllable factors of variation by interacting with the world.
CoRR, 2018

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step.
Proceedings of the 6th International Conference on Learning Representations, 2018

MaskGAN: Better Text Generation via Filling in the _______.
Proceedings of the 6th International Conference on Learning Representations, 2018


  Loading...