Noam Shazeer

Affiliations:
  • Google


According to our database1, Noam Shazeer authored at least 50 papers between 1999 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2023
Scaling Up Models and Data with t5x and seqio.
J. Mach. Learn. Res., 2023

PaLM: Scaling Language Modeling with Pathways.
J. Mach. Learn. Res., 2023

2022
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.
J. Mach. Learn. Res., 2022

Scaling Up Models and Data with t5x and seqio.
CoRR, 2022

Designing Effective Sparse Expert Models.
CoRR, 2022

LaMDA: Language Models for Dialog Applications.
CoRR, 2022

2021
Primer: Searching for Efficient Transformers for Language Modeling.
CoRR, 2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs.
CoRR, 2021

Searching for Efficient Transformers for Language Modeling.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.
Proceedings of the 9th International Conference on Learning Representations, 2021

Do Transformer Modifications Transfer Across Implementations and Applications?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer.
J. Mach. Learn. Res., 2020

Talking-Heads Attention.
CoRR, 2020

GLU Variants Improve Transformer.
CoRR, 2020

Faster Transformer Decoding: N-gram Masked Self-Attention.
CoRR, 2020

How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019
Fast Transformer Decoding: One Write-Head is All You Need.
CoRR, 2019

High Resolution Medical Image Analysis with Spatial Partitioning.
CoRR, 2019

Corpora Generation for Grammatical Error Correction.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Music Transformer: Generating Music with Long-Term Structure.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Weakly Supervised Grammatical Error Correction using Iterative Decoding.
CoRR, 2018

An Improved Relative Self-Attention Mechanism for Transformer with Application to Music Generation.
CoRR, 2018

Image Transformer.
CoRR, 2018

Blockwise Parallel Decoding for Deep Autoregressive Models.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Mesh-TensorFlow: Deep Learning for Supercomputers.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost.
Proceedings of the 35th International Conference on Machine Learning, 2018

Image Transformer.
Proceedings of the 35th International Conference on Machine Learning, 2018

Fast Decoding in Sequence Models Using Discrete Latent Variables.
Proceedings of the 35th International Conference on Machine Learning, 2018

Generating Wikipedia by Summarizing Long Sequences.
Proceedings of the 6th International Conference on Learning Representations, 2018

HydraNets: Specialized Dynamic Architectures for Efficient Inference.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Tensor2Tensor for Neural Machine Translation.
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas, 2018

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation.
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017
One Model To Learn Them All.
CoRR, 2017

Attention is All you Need.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Sparse Non-negative Matrix Language Modeling.
Trans. Assoc. Comput. Linguistics, 2016

Swivel: Improving Embeddings by Noticing What's Missing.
CoRR, 2016

Exploring the Limits of Language Modeling.
CoRR, 2016

NN-Grams: Unifying Neural Network and n-Gram Language Models for Speech Recognition.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

End-to-end text-dependent speaker verification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Sparse non-negative matrix language modeling for skip-grams.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Pruning sparse non-negative matrix n-gram language models.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Sparse non-negative matrix language modeling for geo-annotated query session data.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014
Skip-gram Language Modeling Using Sparse Non-negative Matrix Probability Estimation.
CoRR, 2014

2010
Variational Program Inference
CoRR, 2010

2002
A probabilistic approach to solving crossword puzzles.
Artif. Intell., 2002

1999
Solving Crossword Puzzles as Probabilistic Constraint Satisfaction.
Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999

Solving Crosswords with PROVERB.
Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999

PROVERB: The Probabilistic Cruciverbalist.
Proceedings of the Sixteenth National Conference on Artificial Intelligence and Eleventh Conference on Innovative Applications of Artificial Intelligence, 1999


  Loading...