Jascha Sohl-Dickstein

Affiliations:
  • Google Brain, Mountain View, CA, USA
  • UC Berkeley, Redwood Center for Theoretical Neuroscience, CA, USA (PhD 2012)


According to our database1, Jascha Sohl-Dickstein authored at least 110 papers between 2010 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models.
Trans. Mach. Learn. Res., 2024

Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability.
CoRR, 2024

Training LLMs over Neurally Compressed Text.
CoRR, 2024

The boundary of neural network trainability is fractal.
CoRR, 2024

Position: Levels of AGI for Operationalizing Progress on the Path to AGI.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Scaling Exponents Across Parameterizations and Optimizers.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Small-scale proxies for large-scale Transformer training instabilities.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?
CoRR, 2023

Levels of AGI: Operationalizing Progress on the Path to AGI.
CoRR, 2023

Noise-Reuse in Online Evolution Strategies.
CoRR, 2023

Variance-Reduced Gradient Estimation via Noise-Reuse in Online Evolution Strategies.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC.
Proceedings of the International Conference on Machine Learning, 2023

2022
General-Purpose In-Context Learning by Meta-Learning Transformers.
CoRR, 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up.
CoRR, 2022

Language Model Cascades.
CoRR, 2022

A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies (Extended Abstract).
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

Fast Finite Width Neural Tangent Kernel.
Proceedings of the International Conference on Machine Learning, 2022

Wide Bayesian neural networks have a simple weight posterior: theory and accelerated sampling.
Proceedings of the International Conference on Machine Learning, 2022

Practical Tradeoffs between Memory, Compute, and Performance in Learned Optimizers.
Proceedings of the Conference on Lifelong Learning Agents, 2022

2021
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2021

Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping.
CoRR, 2021

Training Learned Optimizers with Randomly Initialized Learned Optimizers.
CoRR, 2021

Reverse engineering learned optimizers reveals known and novel mechanisms.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Whitening and Second Order Optimization Both Make Information in the Dataset Unusable During Training, and Can Reduce or Prevent Generalization.
Proceedings of the 38th International Conference on Machine Learning, 2021

Unbiased Gradient Estimation in Unrolled Computation Graphs with Persistent Evolution Strategies.
Proceedings of the 38th International Conference on Machine Learning, 2021

Score-Based Generative Modeling through Stochastic Differential Equations.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Parallel Training of Deep Networks with Local Updates.
CoRR, 2020

Towards NNGP-guided Neural Architecture Search.
CoRR, 2020

Is Batch Norm unique? An empirical investigation and prescription to emulate the best properties of common normalizers without batch dependence.
CoRR, 2020

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves.
CoRR, 2020

Whitening and second order optimization both destroy information about the dataset, and can make generalization impossible.
CoRR, 2020

A new method for parameter estimation in probabilistic models: Minimum probability flow.
CoRR, 2020

Exact posterior distributions of wide Bayesian neural networks.
CoRR, 2020

The large learning rate phase of deep learning: the catapult mechanism.
CoRR, 2020

Using a thousand optimization tasks to learn hyperparameter search strategies.
CoRR, 2020

On the infinite width limit of neural networks with a standard parameterization.
CoRR, 2020

Finite Versus Infinite Neural Networks: an Empirical Study.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Infinite attention: NNGP and NTK for deep attention networks.
Proceedings of the 37th International Conference on Machine Learning, 2020

Neural Tangents: Fast and Easy Infinite Neural Networks in Python.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Measuring the Effects of Data Parallelism on Neural Network Training.
J. Mach. Learn. Res., 2019

Neural reparameterization improves structural optimization.
CoRR, 2019

Using learned optimizers to make models robust to input noise.
CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.
CoRR, 2019

Eliminating all bad Local Minima from Loss Landscapes without even adding an Extra Unit.
CoRR, 2019

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Invertible Convolutional Flow.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study.
Proceedings of the 36th International Conference on Machine Learning, 2019

Understanding and correcting pathologies in the training of learned optimizers.
Proceedings of the 36th International Conference on Machine Learning, 2019

Guided evolutionary strategies: augmenting random search with surrogate gradients.
Proceedings of the 36th International Conference on Machine Learning, 2019

A Mean Field Theory of Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes.
Proceedings of the 7th International Conference on Learning Representations, 2019

Meta-Learning Update Rules for Unsupervised Representation Learning.
Proceedings of the 7th International Conference on Learning Representations, 2019

Adversarial Reprogramming of Neural Networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

A RAD approach to deep mixture models.
Proceedings of the Deep Generative Models for Highly Structured Data, 2019

2018
Learned optimizers that outperform SGD on wall-clock and test loss.
CoRR, 2018

Bayesian Convolutional Neural Networks with Many Channels are Gaussian Processes.
CoRR, 2018

Guided evolutionary strategies: escaping the curse of dimensionality in random search.
CoRR, 2018

Stochastic natural gradient descent draws posterior samples in function space.
CoRR, 2018

Learning Unsupervised Learning Rules.
CoRR, 2018

Adversarial Examples that Fool both Human and Computer Vision.
CoRR, 2018

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

PCA of high dimensional random walks with comparison to neural network training.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10, 000-Layer Vanilla Convolutional Neural Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

Sensitivity and Generalization in Neural Networks: an Empirical Study.
Proceedings of the 6th International Conference on Learning Representations, 2018

Learning to Learn Without Labels.
Proceedings of the 6th International Conference on Learning Representations, 2018

Generalizing Hamiltonian Monte Carlo with Neural Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018

Deep Neural Networks as Gaussian Processes.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Minimum and Maximum Entropy Distributions for Binary Systems with Known Means and Pairwise Correlations.
Entropy, 2017

A Correspondence Between Random Neural Networks and Statistical Field Theory.
CoRR, 2017

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement.
CoRR, 2017

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Learned Optimizers that Scale and Generalize.
Proceedings of the 34th International Conference on Machine Learning, 2017

On the Expressive Power of Deep Neural Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability.
Proceedings of the 34th International Conference on Machine Learning, 2017

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models.
Proceedings of the 5th International Conference on Learning Representations, 2017

Deep Information Propagation.
Proceedings of the 5th International Conference on Learning Representations, 2017

Unrolled Generative Adversarial Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

Explaining the Learning Dynamics of Direct Feedback Alignment.
Proceedings of the 5th International Conference on Learning Representations, 2017

Density estimation using Real NVP.
Proceedings of the 5th International Conference on Learning Representations, 2017

Capacity and Trainability in Recurrent Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Survey of Expressivity in Deep Neural Networks.
CoRR, 2016

Improved generator objectives for GANs.
CoRR, 2016

A universal tradeoff between power, precision and speed in physical communication.
CoRR, 2016

Intelligible Language Modeling with Input Switched Affine Networks.
CoRR, 2016

Exponential expressivity in deep neural networks through transient chaos.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2015
A Device for Human Ultrasonic Echolocation.
IEEE Trans. Biomed. Eng., 2015

Technical Note on Equivalence Between Recurrent Neural Network Time Series Models and Variational Bayesian Models.
CoRR, 2015

Deep Knowledge Tracing.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Deep Unsupervised Learning using Nonequilibrium Thermodynamics.
Proceedings of the 32nd International Conference on Machine Learning, 2015

2014
Modeling Higher-Order Correlations within Cortical Microcolumns.
PLoS Comput. Biol., 2014

Analyzing noise in autoencoders and deep networks.
CoRR, 2014

Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods.
Proceedings of the 31th International Conference on Machine Learning, 2014

Hamiltonian Monte Carlo Without Detailed Balance.
Proceedings of the 31th International Conference on Machine Learning, 2014

2013
An adaptive low dimensional quasi-Newton sum of functions optimizer.
CoRR, 2013

Measurably Increasing Motivation in MOOCs.
Proceedings of the Workshops at the 16th International Conference on Artificial Intelligence in Education AIED 2013, 2013

Controlled experiments on millions of students to personalize learning.
Proceedings of the Workshops at the 16th International Conference on Artificial Intelligence in Education AIED 2013, 2013

2012
Efficient Methods for Unsupervised Learning of Probabilistic Models.
PhD thesis, 2012

Efficient Methods for Unsupervised Learning of Probabilistic Models
CoRR, 2012

Hamiltonian Monte Carlo with Reduced Momentum Flips
CoRR, 2012

Hamiltonian Annealed Importance Sampling for partition function estimation
CoRR, 2012

The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
CoRR, 2012

Training sparse natural image models with a fast Gibbs sampler of an extended state space.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2011
Minimum Probability Flow Learning.
Proceedings of the 28th International Conference on Machine Learning, 2011

Building a better probabilistic model of images by factorization.
Proceedings of the IEEE International Conference on Computer Vision, 2011

Lie Group Transformation Models for Predictive Video Coding.
Proceedings of the 2011 Data Compression Conference (DCC 2011), 2011

2010
An Unsupervised Algorithm For Learning Lie Group Transformations
CoRR, 2010


  Loading...