We stand with Ukraine

We stand with Ukraine

Daniel Soudry

According to our database¹, Daniel Soudry authored at least 81 papers between 2010 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

The Implicit Bias of Gradient Descent on Separable Multiclass Data.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Provable Tempered Overfitting of Minimal Nets and Typical Nets.

[BibT_eX]

[DOI]

,

William M. Hoza

,

,

,

,

CoRR, 2024

Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Scaling FP8 training to trillion-token LLMs.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2024

Stable Minima Cannot Overfit in Univariate ReLU Networks: Generalization by Large Step Sizes.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2024

How Uniform Random Weights Induce Non-uniform Bias: Typical Interpolating Neural Networks Generalize with Narrow Teachers.

[BibT_eX]

[DOI]

,

,

Mor Shpigel Nacson

,

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting - An Analytical Model.

[BibT_eX]

[DOI]

Daniel Goldfarb

,

,

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Towards Cheaper Inference in Deep Networks with Lower Bit-Width Accumulators.

[BibT_eX]

[DOI]

Yaniv Blumenfeld

,

,

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Explore to Generalize in Zero-Shot RL.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

How do Minimum-Norm Shallow Denoisers Look in Function Space?

[BibT_eX]

[DOI]

,

,

Yaniv Blumenfeld

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DropCompute: simple and more robust distributed synchronous training via compute variance reduction.

[BibT_eX]

[DOI]

,

Shahar Gottlieb

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Gradient Descent Monotonically Decreases the Sharpness of Gradient Flow Solutions in Scalar Networks and Beyond.

[BibT_eX]

[DOI]

,

Mor Shpigel Nacson

,

,

Proceedings of the International Conference on Machine Learning, 2023

Continual Learning in Linear Classification on Separable Data.

[BibT_eX]

[DOI]

,

Edward Moroshko

,

,

,

,

,

Proceedings of the International Conference on Machine Learning, 2023

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Minimum Variance Unbiased N: M Sparsity for the Neural Gradients.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats.

[BibT_eX]

[DOI]

,

,

,

Hilla Ben-Yaacov

,

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

The Role of Codeword-to-Class Assignments in Error-Correcting Codes: An Empirical Study.

[BibT_eX]

[DOI]

,

,

Tamar Weiss Orzech

,

,

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2023

2022

Optimal Fine-Grained N: M sparsity for Activations and Neural Gradients.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2022

Implicit Bias of the Step Size in Linear Diagonal Neural Networks.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

Kavya Ravichandran

,

,

Proceedings of the International Conference on Machine Learning, 2022

A Statistical Framework for Efficient Out of Distribution Detection in Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Tenth International Conference on Learning Representations, 2022

How catastrophic can catastrophic forgetting be in linear regression?

[BibT_eX]

[DOI]

,

Edward Moroshko

,

,

,

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability.

[BibT_eX]

[DOI]

,

,

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Task-Agnostic Continual Learning Using Online Variational Bayes With Fixed-Point Updates.

[BibT_eX]

[DOI]

,

,

,

Neural Comput., 2021

Logarithmic Unbiased Quantization: Practical 4-bit Training in Deep Learning.

[BibT_eX]

[DOI]

,

,

,

Hilla Ben-Yaacov

,

CoRR, 2021

Statistical Testing for Efficient Out of Distribution Detection in Deep Neural Networks.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

The Implicit Bias of Minima Stability: A View from Function Space.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N: M Transposable Masks.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Accurate Post Training Quantization With Small Calibration Sets.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent.

[BibT_eX]

[DOI]

,

Edward Moroshko

,

Mor Shpigel Nacson

,

Blake E. Woodworth

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Neural gradients are near-lognormal: improved quantized and sparse training.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

The Global Optimization Geometry of Shallow Linear Neural Networks.

[BibT_eX]

[DOI]

,

,

Yonina C. Eldar

,

Michael B. Wakin

J. Math. Imaging Vis., 2020

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2020

Neural gradients are lognormally distributed: understanding sparse and quantized training.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2020

Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy.

[BibT_eX]

[DOI]

Edward Moroshko

,

Blake E. Woodworth

,

Suriya Gunasekar

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization?

[BibT_eX]

[DOI]

Yaniv Blumenfeld

,

,

Proceedings of the 37th International Conference on Machine Learning, 2020

A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case.

[BibT_eX]

[DOI]

,

Rebecca Willett

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?

[BibT_eX]

[DOI]

,

Mor Shpigel Nacson

,

,

Proceedings of the 8th International Conference on Learning Representations, 2020

Augment Your Batch: Improving Generalization Through Instance Repetition.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

The Knowledge Within: Methods for Data-Free Model Compression.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Kernel and Rich Regimes in Overparametrized Models.

[BibT_eX]

[DOI]

Blake E. Woodworth

,

Suriya Gunasekar

,

,

Edward Moroshko

,

,

,

,

Proceedings of the Conference on Learning Theory, 2020

2019

MTJ-Based Hardware Synapse Design for Quantized Deep Neural Networks.

[BibT_eX]

[DOI]

Tzofnat Greenberg-Toledo

,

,

,

Shahar Kvatinsky

CoRR, 2019

Mix & Match: training convnets with mixed image sizes for improved accuracy, speed and scale resiliency.

[BibT_eX]

[DOI]

,

Berry Weinstein

,

,

,

Torsten Hoefler

,

CoRR, 2019

Augment your batch: better training with larger batches.

[BibT_eX]

[DOI]

,

,

,

,

Torsten Hoefler

,

CoRR, 2019

A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off.

[BibT_eX]

[DOI]

Yaniv Blumenfeld

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Post training 4-bit quantization of convolutional networks for rapid-deployment.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

Suriya Gunasekar

,

,

,

Proceedings of the 36th International Conference on Machine Learning, 2019

How do infinite width bounded norm networks look in function space?

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Conference on Learning Theory, 2019

Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

,

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

Convergence of Gradient Descent on Separable Data.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

,

Suriya Gunasekar

,

Pedro Henrique Pamplona Savarese

,

,

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018

Seizure pathways: A model-based investigation.

[BibT_eX]

[DOI]

Philippa J. Karoly

,

,

,

David B. Grayden

,

,

Dean R. Freestone

PLoS Comput. Biol., 2018

The Implicit Bias of Gradient Descent on Separable Data.

[BibT_eX]

[DOI]

,

,

Mor Shpigel Nacson

,

Suriya Gunasekar

,

J. Mach. Learn. Res., 2018

ACIQ: Analytical Clipping for Integer Quantization of neural networks.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2018

Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2018

Convergence of Gradient Descent on Separable Data.

[BibT_eX]

[DOI]

Mor Shpigel Nacson

,

,

Suriya Gunasekar

,

,

CoRR, 2018

On the Blindspots of Convolutional Networks.

[BibT_eX]

[DOI]

,

,

CoRR, 2018

Norm matters: efficient and accurate normalization schemes in deep networks.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Implicit Bias of Gradient Descent on Linear Convolutional Networks.

[BibT_eX]

[DOI]

Suriya Gunasekar

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Scalable methods for 8-bit training of neural networks.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Characterizing Implicit Bias in Terms of Optimization Geometry.

[BibT_eX]

[DOI]

Suriya Gunasekar

,

,

,

Proceedings of the 35th International Conference on Machine Learning, 2018

The Implicit Bias of Gradient Descent on Separable Data.

[BibT_eX]

[DOI]

,

,

Mor Shpigel Nacson

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Exponentially vanishing sub-optimal local minima in multilayer neural networks.

[BibT_eX]

[DOI]

,

Proceedings of the 6th International Conference on Learning Representations, 2018

Fix your classifier: the marginal value of training the last weight layer.

[BibT_eX]

[DOI]

,

,

Proceedings of the 6th International Conference on Learning Representations, 2018

2017

Multi-scale approaches for high-speed imaging and analysis of large neural populations.

[BibT_eX]

[DOI]

Johannes Friedrich

,

,

,

,

Misha B. Ahrens

,

,

Darcy S. Peterka

,

PLoS Comput. Biol., 2017

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations.

[BibT_eX]

[DOI]

,

Matthieu Courbariaux

,

,

,

J. Mach. Learn. Res., 2017

The Implicit Bias of Gradient Descent on Separable Data.

[BibT_eX]

[DOI]

,

,

CoRR, 2017

Train longer, generalize better: closing the generalization gap in large batch training of neural networks.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

No bad local minima: Data independent training error guarantees for multilayer neural networks.

[BibT_eX]

[DOI]

,

CoRR, 2016

Binarized Neural Networks.

[BibT_eX]

[DOI]

,

Matthieu Courbariaux

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

A fully analog memristor-based neural network with online gradient training.

[BibT_eX]

[DOI]

,

Sergey Greshnikov

,

,

Shahar Kvatinsky

Proceedings of the IEEE International Symposium on Circuits and Systems, 2016

2015

Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training.

[BibT_eX]

[DOI]

,

Dotan Di Castro

,

,

Avinoam Kolodny

,

Shahar Kvatinsky

IEEE Trans. Neural Networks Learn. Syst., 2015

Efficient "Shotgun" Inference of Neural Connectivity from Highly Sub-sampled Activity Data.

[BibT_eX]

[DOI]

,

,

Patrick Stinson

,

,

,

PLoS Comput. Biol., 2015

Training Binary Multilayer Neural Networks for Image Classification using Expectation Backpropagation.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2015

2014

The neuronal response at extended timescales: a linearized spiking input-output relation.

[BibT_eX]

[DOI]

,

Frontiers Comput. Neurosci., 2014

The neuronal response at extended timescales: long-term correlations without long-term memory.

[BibT_eX]

[DOI]

,

Frontiers Comput. Neurosci., 2014

Diffusion approximation-based simulation of stochastic ion channels: which method to use?

[BibT_eX]

[DOI]

,

,

Frontiers Comput. Neurosci., 2014

Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

2012

Conductance-Based Neuron Models and the Slow Dynamics of Excitability.

[BibT_eX]

[DOI]

,

Frontiers Comput. Neurosci., 2012

"Neuronal spike generation mechanism as an oversampling, noise-shaping A-to-D converter".

[BibT_eX]

[DOI]

Dmitri B. Chklovskii

,

Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

2010

History-Dependent Dynamics in a Generic Model of Ion Channels - An Analytic Study.

[BibT_eX]

[DOI]

,

Frontiers Comput. Neurosci., 2010

Loading...