Ohad Shamir

According to our database1, Ohad Shamir authored at least 149 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of three.
  • Erdős number3 of two.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization.
J. Mach. Learn. Res., 2024

On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization.
CoRR, 2024

Generalization in Kernel Regression Under Realistic Assumptions.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Depth Separation in Norm-Bounded Infinite-Width Neural Networks.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

Open Problem: Anytime Convergence Rate of Gradient Descent.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023
The Implicit Bias of Benign Overfitting.
J. Mach. Learn. Res., 2023

Initialization-Dependent Sample Complexity of Linear Predictors and Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

From Tempered to Benign Overfitting in ReLU Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Accelerated Zeroth-order Method for Non-Smooth Stochastic Convex Optimization Problem with Infinite Variance.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Deterministic Nonsmooth Nonconvex Optimization.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Implicit Regularization Towards Rank Minimization in ReLU Networks.
Proceedings of the International Conference on Algorithmic Learning Theory, 2023

2022
Oracle Complexity in Nonsmooth Nonconvex Optimization.
J. Mach. Learn. Res., 2022

On the Complexity of Finding Small Subgradients in Nonsmooth Optimization.
CoRR, 2022

Gradient Methods Provably Converge to Non-Robust Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On Margin Maximization in Linear and ReLU Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

The Sample Complexity of One-Hidden-Layer Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Reconstructing Training Data From Trained Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Elephant in the Room: Non-Smooth Non-Convex Optimization.
Proceedings of the International Symposium on Artificial Intelligence and Mathematics 2022 (ISAIM 2022), 2022

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication (Extended Abstract).
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

On the Optimal Memorization Power of ReLU Neural Networks.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Width is Less Important than Depth in ReLU Neural Networks.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021
Gradient Methods Never Overfit On Separable Data.
J. Mach. Learn. Res., 2021

Replay For Safety.
CoRR, 2021

Convergence Results For Q-Learning With Experience Replay.
CoRR, 2021

Size and Depth Separation in Approximating Natural Functions with Neural Networks.
CoRR, 2021

Learning a Single Neuron with Bias Using Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Random Shuffling Beats SGD Only After Many Epochs on Ill-Conditioned Problems.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

A Stochastic Newton Algorithm for Distributed Convex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

The Min-Max Complexity of Distributed Stochastic Convex Optimization with Intermittent Communication.
Proceedings of the Conference on Learning Theory, 2021

Implicit Regularization in ReLU Networks with the Square Loss.
Proceedings of the Conference on Learning Theory, 2021

Size and Depth Separation in Approximating Benign Functions with Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

The Effects of Mild Over-parameterization on the Optimization Landscape of Shallow ReLU Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

The Connection Between Approximation, Depth Separation and Learnability in Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

2020
Neural Networks with Small Weights and Depth-Separation Barriers.
Electron. Colloquium Comput. Complex., 2020

High-Order Oracle Complexity of Smooth and Strongly Convex Optimization.
CoRR, 2020

Can We Find Near-Approximately-Stationary Points of Nonsmooth Nonconvex Functions?
CoRR, 2020

Is Local SGD Better than Minibatch SGD?
Proceedings of the 37th International Conference on Machine Learning, 2020

Proving the Lottery Ticket Hypothesis: Pruning is All You Need.
Proceedings of the 37th International Conference on Machine Learning, 2020

The Complexity of Finding Stationary Points with Stochastic Gradient Descent.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning a Single Neuron with Gradient Methods.
Proceedings of the Conference on Learning Theory, 2020

How Good is SGD with Random Shuffling?
Proceedings of the Conference on Learning Theory, 2020

A Tight Convergence Analysis for Stochastic Gradient Descent with Delayed Updates.
Proceedings of the Algorithmic Learning Theory, 2020

2019
Oracle complexity of second-order methods for smooth convex optimization.
Math. Program., 2019

Space lower bounds for linear prediction.
CoRR, 2019

On the Power and Limitations of Random Features for Understanding Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks.
Proceedings of the Conference on Learning Theory, 2019

Depth Separations in Neural Networks: What is Actually Being Separated?
Proceedings of the Conference on Learning Theory, 2019

The Complexity of Making the Gradient Small in Stochastic Convex Optimization.
Proceedings of the Conference on Learning Theory, 2019

Space lower bounds for linear prediction in the streaming model.
Proceedings of the Conference on Learning Theory, 2019

2018
Distribution-Specific Hardness of Learning Neural Networks.
J. Mach. Learn. Res., 2018

Are ResNets Provably Better than Linear Predictors?
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Global Non-convex Optimization with Discretized Diffusions.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Spurious Local Minima are Common in Two-Layer ReLU Neural Networks.
Proceedings of the 35th International Conference on Machine Learning, 2018

Size-Independent Sample Complexity of Neural Networks.
Proceedings of the Conference On Learning Theory, 2018

Detecting Correlations with Little Memory and Communication.
Proceedings of the Conference On Learning Theory, 2018

Bandit Regret Scaling with the Effective Loss Range.
Proceedings of the Algorithmic Learning Theory, 2018

2017
Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback.
SIAM J. Comput., 2017

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback.
J. Mach. Learn. Res., 2017

Weight Sharing is Crucial to Succesful Optimization.
CoRR, 2017

Failures of Deep Learning.
CoRR, 2017

Online Learning with Local Permutations and Delayed Feedback.
Proceedings of the 34th International Conference on Machine Learning, 2017

Failures of Gradient-Based Deep Learning.
Proceedings of the 34th International Conference on Machine Learning, 2017

Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks.
Proceedings of the 34th International Conference on Machine Learning, 2017

Communication-efficient Algorithms for Distributed Stochastic Principal Component Analysis.
Proceedings of the 34th International Conference on Machine Learning, 2017

Oracle Complexity of Second-Order Methods for Finite-Sum Problems.
Proceedings of the 34th International Conference on Machine Learning, 2017

Preface: Conference on Learning Theory (COLT), 2017.
Proceedings of the 30th Conference on Learning Theory, 2017

2016
Unified Algorithms for Online Learning and Competitive Analysis.
Math. Oper. Res., 2016

On Lower and Upper Bounds in Smooth and Strongly Convex Optimization.
J. Mach. Learn. Res., 2016

Without-Replacement Sampling for Stochastic Gradient Methods: Convergence Results and Application to Distributed Optimization.
CoRR, 2016

Depth Separation in ReLU Networks for Approximating Smooth Non-Linear Functions.
CoRR, 2016

Without-Replacement Sampling for Stochastic Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Dimension-Free Iteration Complexity of Finite Sum Optimization Problems.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Convergence of Stochastic Gradient Descent for PCA.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity.
Proceedings of the 33nd International Conference on Machine Learning, 2016

On the Quality of the Initial Basin in Overspecified Neural Networks.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Multi-Player Bandits - a Musical Chairs Approach.
Proceedings of the 33nd International Conference on Machine Learning, 2016

On the Iteration Complexity of Oblivious First-Order Optimization Algorithms.
Proceedings of the 33nd International Conference on Machine Learning, 2016

The Power of Depth for Feedforward Neural Networks.
Proceedings of the 29th Conference on Learning Theory, 2016

2015
The sample complexity of learning linear predictors with the squared loss.
J. Mach. Learn. Res., 2015

On Lower and Upper Bounds for Smooth and Strongly Convex Optimization Problems.
CoRR, 2015

Communication Complexity of Distributed Convex Learning and Optimization.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Attribute Efficient Linear Regression with Distribution-Dependent Sampling.
Proceedings of the 32nd International Conference on Machine Learning, 2015

On the Complexity of Bandit Linear Optimization.
Proceedings of The 28th Conference on Learning Theory, 2015

On the Complexity of Learning with Kernels.
Proceedings of The 28th Conference on Learning Theory, 2015

Graph Approximation and Clustering on a Budget.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

2014
Matrix completion with the trace norm: learning, bounding, and transducing.
J. Mach. Learn. Res., 2014

A Stochastic PCA Algorithm with an Exponential Convergence Rate.
CoRR, 2014

Attribute Efficient Linear Regression with Data-Dependent Sampling.
CoRR, 2014

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

On the Computational Efficiency of Training Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Communication-Efficient Distributed Optimization using an Approximate Newton-type Method.
Proceedings of the 31th International Conference on Machine Learning, 2014

On-demand, Spot, or Both: Dynamic Resource Allocation for Executing Batch Jobs in the Cloud.
Proceedings of the 11th International Conference on Autonomic Computing, 2014

Distributed stochastic optimization and learning.
Proceedings of the 52nd Annual Allerton Conference on Communication, 2014

2013
A Provably Efficient Algorithm for Training Deep Networks
CoRR, 2013

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing Using Convex Optimization.
Proceedings of the String Processing and Information Retrieval, 2013

Online Learning with Switching Costs and Other Adaptive Adversaries.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes.
Proceedings of the 30th International Conference on Machine Learning, 2013

Probabilistic Label Trees for Efficient Large Scale Image Classification.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013

On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization.
Proceedings of the COLT 2013, 2013

Online Learning for Time Series Prediction.
Proceedings of the COLT 2013, 2013

Efficient Transductive Online Learning via Randomized Rounding.
Proceedings of the Empirical Inference - Festschrift in Honor of Vladimir N. Vapnik, 2013

Localization and Adaptation in Online Learning.
Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, 2013

2012
Learning from Weak Teachers.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?
Proceedings of the COLT 2012, 2012

Using More Data to Speed-up Training Time.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

There's a Hole in My Data Space: Piecewise Predictors for Heterogeneous Learning Problems.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

Optimal Distributed Online Prediction Using Mini-Batches.
J. Mach. Learn. Res., 2012

Relax and Localize: From Value to Algorithms
CoRR, 2012

Relax and Randomize : From Value to Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization.
Proceedings of the 29th International Conference on Machine Learning, 2012

Decoupling Exploration and Exploitation in Multi-Armed Bandits.
Proceedings of the 29th International Conference on Machine Learning, 2012

2011
Online Learning of Noisy Data.
IEEE Trans. Inf. Theory, 2011

Learning Kernel-Based Halfspaces with the 0-1 Loss.
SIAM J. Comput., 2011

Spectral Clustering on a Budget.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing.
Proceedings of the COLT 2011, 2011

Efficient Learning with Partially Observed Attributes.
J. Mach. Learn. Res., 2011

A Variant of Azuma's Inequality for Martingales with Subgaussian Tails
CoRR, 2011

Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
CoRR, 2011

From Bandits to Experts: On the Value of Side-Observations.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Learning with the weighted trace-norm under arbitrary sampling distributions.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Better Mini-Batch Algorithms via Accelerated Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Efficient Online Learning via Randomized Rounding.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

Learning Linear and Kernel Predictors with the 0-1 Loss Function.
Proceedings of the IJCAI 2011, 2011

Adaptively Learning the Crowd Kernel.
Proceedings of the 28th International Conference on Machine Learning, 2011

Large-Scale Convex Minimization with a Low-Rank Constraint.
Proceedings of the 28th International Conference on Machine Learning, 2011

Optimal Distributed Online Prediction.
Proceedings of the 28th International Conference on Machine Learning, 2011

Quantity Makes Quality: Learning with Partial Views.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010
On stability in statistical machine learning (עם תקציר בעברית ושער נוסף: על יציבות בלמידה חישובית סטטיסטית.).
PhD thesis, 2010

Learning and generalization with the information bottleneck.
Theor. Comput. Sci., 2010

Stability and model selection in <i>k</i>-means clustering.
Mach. Learn., 2010

Learning to classify with missing and corrupted features.
Mach. Learn., 2010

Learnability, Stability and Uniform Convergence.
J. Mach. Learn. Res., 2010

Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Multiclass-Multilabel Classification with More Classes than Examples.
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010

Robust Distributed Online Prediction
CoRR, 2010

Learning Kernel-Based Halfspaces with the Zero-One Loss.
Proceedings of the COLT 2010, 2010

Online Learning of Noisy Data with Kernels.
Proceedings of the COLT 2010, 2010

2009
Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity
CoRR, 2009

Good learners for evil teachers.
Proceedings of the 26th Annual International Conference on Machine Learning, 2009

Learnability and Stability in the General Learning Setting.
Proceedings of the COLT 2009, 2009

Stochastic Convex Optimization.
Proceedings of the COLT 2009, 2009

The Complexity of Improperly Learning Large Margin Halfspaces.
Proceedings of the COLT 2009, 2009

Vox Populi: Collecting High-Quality Labels from a Crowd.
Proceedings of the COLT 2009, 2009

2008
On the Reliability of Clustering Stability in the Large Sample Regime.
Proceedings of the Advances in Neural Information Processing Systems 21, 2008

Learning to classify with missing and corrupted features.
Proceedings of the Machine Learning, 2008

Model Selection and Stability in k-means Clustering.
Proceedings of the 21st Annual Conference on Learning Theory, 2008

2007
Cluster Stability for Finite Samples.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007


  Loading...