Jeffrey Pennington

According to our database1, Jeffrey Pennington authored at least 52 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.
Trans. Mach. Learn. Res., 2024

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability.
CoRR, 2024

4+3 Phases of Compute-Optimal Neural Scaling Laws.
CoRR, 2024

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability.
CoRR, 2024

Training LLMs over Neurally Compressed Text.
CoRR, 2024

Scaling Exponents Across Parameterizations and Optimizers.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Small-scale proxies for large-scale Transformer training instabilities.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Temperature check: theory and practice for training models with softmax-cross-entropy losses.
Trans. Mach. Learn. Res., 2023

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
CoRR, 2023

Second-order regression models exhibit progressive sharpening to the edge of stability.
Proceedings of the International Conference on Machine Learning, 2023

2022
Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression.
CoRR, 2022

Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm.
Proceedings of the International Conference on Machine Learning, 2022

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling.
Proceedings of the International Conference on Machine Learning, 2022

Anisotropic Random Feature Regression in High Dimensions.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Online education for data science: Opportunities and challenges.
Proceedings of the AMIA 2022, 2022

A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Covariate Shift in High-Dimensional Random Feature Regression.
CoRR, 2021

Overparameterization Improves Robustness to Covariate Shift in High Dimensions.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Exact posterior distributions of wide Bayesian neural networks.
CoRR, 2020

Finite Versus Infinite Neural Networks: an Empirical Study.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Understanding Double Descent Requires A Fine-Grained Bias-Variance Decomposition.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Disentangling Trainability and Generalization in Deep Neural Networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Disentangling trainability and generalization in deep learning.
CoRR, 2019

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning.
CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.
CoRR, 2019

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs.
CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Mean Field Theory of Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes.
Proceedings of the 7th International Conference on Learning Representations, 2019

KAMA-NNs: Low-dimensional Rotation Based Neural Networks.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes.
CoRR, 2018

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

Sensitivity and Generalization in Neural Networks: an Empirical Study.
Proceedings of the 6th International Conference on Learning Representations, 2018

Deep Neural Networks as Gaussian Processes.
Proceedings of the 6th International Conference on Learning Representations, 2018

The emergence of spectral universality in deep networks.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2018

2017
A Correspondence Between Random Neural Networks and Statistical Field Theory.
CoRR, 2017

Nonlinear random matrix theory for deep learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Geometry of Neural Network Loss Surfaces via Random Matrix Theory.
Proceedings of the 34th International Conference on Machine Learning, 2017

2016
Clinical Data Research Network Lessons Learned.
Proceedings of the Summit on Clinical Research Informatics, 2016

2015
Spherical Random Features for Polynomial Kernels.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

2014
Glove: Global Vectors for Word Representation.
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 2014

2011
Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions.
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, 2011


  Loading...