Mikhail Belkin

According to our database1, Mikhail Belkin authored at least 117 papers between 2001 and 2024.

Collaborative distances:

Awards

ACM Fellow

ACM Fellow 2023, "For contributions to modern machine learning theory and algorithms".

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product.
CoRR, 2024

Average gradient outer product as a mechanism for deep neural collapse.
CoRR, 2024

Unmemorization in Large Language Models via Self-Distillation and Deliberate Imagination.
CoRR, 2024

Linear Recursive Feature Machines provably recover low-rank matrices.
CoRR, 2024

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Quadratic models for understanding catapult dynamics of neural networks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

More is Better: when Infinite Overparameterization is Optimal and Overfitting is Obligatory.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

On the Nyström Approximation for Preconditioning in Kernel Machines.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023
A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors.
SIAM J. Math. Data Sci., December, 2023

On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions.
SIAM J. Math. Data Sci., December, 2023

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory.
CoRR, 2023

Mechanism of feature learning in convolutional neural networks.
CoRR, 2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems.
CoRR, 2023

On Emergence of Clean-Priority Learning in Early Stopped Neural Networks.
CoRR, 2023

Neural tangent kernel at initialization: linear width suffices.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Cut your Losses with Squentropy.
Proceedings of the International Conference on Machine Learning, 2023

Toward Large Kernel Models.
Proceedings of the International Conference on Machine Learning, 2023

2022
Feature learning in neural networks and kernel machines that recursively learn features.
CoRR, 2022

Restricted Strong Convexity of Deep Learning Models with Smooth Activations.
CoRR, 2022

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting.
CoRR, 2022

A note on Linear Bottleneck networks and their Transition to Multilinearity.
CoRR, 2022

Kernel Ridgeless Regression is Inconsistent for Low Dimensions.
CoRR, 2022

Quadratic models for understanding neural network dynamics.
CoRR, 2022

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture.
CoRR, 2022

Wide and Deep Neural Networks Achieve Optimality for Classification.
CoRR, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models.
CoRR, 2022

Limitations of Neural Collapse for Understanding Generalization in Deep Learning.
CoRR, 2022

Benign Overfitting in Two-layer Convolutional Neural Networks.
CoRR, 2022

2021
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation.
Acta Numer., May, 2021

Classification vs regression in overparameterized regimes: Does the loss function matter?
J. Mach. Learn. Res., 2021

Local Quadratic Convergence of Stochastic Gradient Descent with Adaptive Step Size.
CoRR, 2021

Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks.
CoRR, 2021

Multiple Descent: Design Your Own Generalization Curve.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Evaluation of Neural Architectures trained with square Loss vs Cross-Entropy in Classification Tasks.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Two Models of Double Descent for Weak Features.
SIAM J. Math. Data Sci., 2020

Overparameterized neural networks implement associative memory.
Proc. Natl. Acad. Sci. USA, 2020

Back to the Future: Radial Basis Function Network Revisited.
IEEE Trans. Pattern Anal. Mach. Intell., 2020

Linear Convergence and Implicit Regularization of Generalized Mirror Descent with Time-Dependent Mirrors.
CoRR, 2020

Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning.
CoRR, 2020

On the linearity of large non-linear models: when and why the tangent kernel is constant.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Accelerating SGD with momentum for over-parameterized learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Overparameterized Neural Networks Can Implement Associative Memory.
CoRR, 2019

Kernel Machines That Adapt To Gpus For Effective Large Batch Training.
Proceedings of the Second Conference on Machine Learning and Systems, SysML 2019, 2019

Kernel Machines Beat Deep Neural Networks on Mask-Based Single-Channel Speech Enhancement.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Does data interpolation contradict statistical optimality?
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Eigenvectors of Orthogonally Decomposable Functions.
SIAM J. Comput., 2018

Reconciling modern machine learning and the bias-variance trade-off.
CoRR, 2018

On exponential convergence of SGD in non-convex over-parametrized learning.
CoRR, 2018

MaSS: an Accelerated Stochastic Method for Over-parametrized Learning.
CoRR, 2018

Downsampling leads to Image Memorization in Convolutional Autoencoders.
CoRR, 2018

Learning kernels that adapt to GPU.
CoRR, 2018

Parametrized Accelerated Methods Free of Condition Number.
CoRR, 2018

Fast Interactive Image Retrieval using large-scale unlabeled data.
CoRR, 2018

Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

To Understand Deep Learning We Need to Understand Kernel Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Approximation beats concentration? An approximation view on inference with smooth radial kernels.
Proceedings of the Conference On Learning Theory, 2018

Unperturbed: spectral analysis beyond Davis-Kahan.
Proceedings of the Algorithmic Learning Theory, 2018

2017
Diving into the shallows: a computational perspective on large-scale shallow learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016
Learning Privately from Multiparty Data.
CoRR, 2016

Clustering with Bregman Divergences: an Asymptotic Analysis.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Graphons, mergeons, and so on!
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Learning privately from multiparty data.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Basis Learning as an Algorithmic Primitive.
Proceedings of the 29th Conference on Learning Theory, 2016

Back to the Future: Radial Basis Function Networks Revisited.
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016

The Hidden Convexity of Spectral Clustering.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Polynomial Learning of Distribution Families.
SIAM J. Comput., 2015

Optimal Recovery in Noisy ICA.
CoRR, 2015

Probabilistic Zero-shot Classification with Semantic Rankings.
CoRR, 2015

A Pseudo-Euclidean Iteration for Optimal Recovery in Noisy ICA.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Crowd-ML: A Privacy-Preserving Learning Framework for a Crowd of Smart Devices.
Proceedings of the 35th IEEE International Conference on Distributed Computing Systems, 2015

Microwave-Band Circuit-Level Semiconductor Laser Modeling.
Proceedings of the 2015 IEEE European Modelling Symposium, 2015

Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering.
Proceedings of The 28th Conference on Learning Theory, 2015

2014
Learning a Hidden Basis Through Imperfect Measurements: An Algorithmic Primitive.
CoRR, 2014

Learning with Fredholm Kernels.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures.
Proceedings of The 27th Conference on Learning Theory, 2014

2013
Heat flow and a faster algorithm to compute the surface area of a convex body.
Random Struct. Algorithms, 2013

Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Inverse Density as an Inverse Problem: the Fredholm Equation Approach.
Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013

Blind Signal Separation in the Presence of Gaussian Noise.
Proceedings of the COLT 2013, 2013

2012
Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries.
Proceedings of the COLT 2012, 2012

Graph Laplacians on Singular Manifolds: Toward understanding complex spaces: graph Laplacians on manifolds with singularities and boundaries
CoRR, 2012

Metric Based Automatic Event Segmentation.
Proceedings of the Mobile Computing, Applications, and Services, 2012

Automatic Annotation of Daily Activity from Smartphone-Based Multisensory Streams.
Proceedings of the Mobile Computing, Applications, and Services, 2012

2011
Semi-supervised Learning by Higher Order Regularization.
Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011

Laplacian Support Vector Machines Trained in the Primal.
J. Mach. Learn. Res., 2011

Behavior of Graph Laplacians on Manifolds with Boundary
CoRR, 2011

Data Skeletonization via Reeb Graphs.
Proceedings of the Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, 2011

An iterated graph laplacian approach for ranking on manifolds.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

2010
On Learning with Integral Operators.
J. Mach. Learn. Res., 2010

Learning speaker normalization using semisupervised manifold alignment.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Toward Learning Gaussian Mixtures with Arbitrary Separation.
Proceedings of the COLT 2010, 2010

2009
Learning Gaussian Mixtures with Arbitrary Separation
CoRR, 2009

Constructing Laplace operator from point clouds in <i>R</i><sup><i>d</i></sup>.
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2009

Semi-supervised Learning using Sparse Eigenfunction Bases.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009

A Note on Learning with Integral Operators.
Proceedings of the COLT 2009, 2009

2008
Towards a theoretical foundation for Laplacian-based manifold methods.
J. Comput. Syst. Sci., 2008

Component based shape retrieval using differential profiles.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008

Probabilistic mixtures of differential profiles for shape recognition.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Data spectroscopy: learning mixture models using eigenspaces of convolution operators.
Proceedings of the Machine Learning, 2008

Discrete laplace operator on meshed surfaces.
Proceedings of the 24th ACM Symposium on Computational Geometry, 2008

2007
The Value of Labeled and Unlabeled Examples when the Model is Imperfect.
Proceedings of the Advances in Neural Information Processing Systems 20, 2007

2006
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples.
J. Mach. Learn. Res., 2006

On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

Convergence of Laplacian Eigenmaps.
Proceedings of the Advances in Neural Information Processing Systems 19, 2006

2005
Margin Semi-Supervised Learning for Structured Variables.
Proceedings of the Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, 2005

Beyond the point cloud: from transductive to semi-supervised learning.
Proceedings of the Machine Learning, 2005

2004
Semi-Supervised Learning on Riemannian Manifolds.
Mach. Learn., 2004

Limits of Spectral Clustering.
Proceedings of the Advances in Neural Information Processing Systems 17 [Neural Information Processing Systems, 2004

Tikhonov regularization and semi-supervised learning on large graphs.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

On the Convergence of Spectral Clustering on Random Samples: The Normalized Case.
Proceedings of the Learning Theory, 17th Annual Conference on Learning Theory, 2004

Regularization and Semi-supervised Learning on Large Graphs.
Proceedings of the Learning Theory, 17th Annual Conference on Learning Theory, 2004

2003
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation.
Neural Comput., 2003

2002
Using eigenvectors of the bigram graph to infer morpheme identity.
Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, 2002

Using Manifold Stucture for Partially Labeled Classification.
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002

2001
Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering.
Proceedings of the Advances in Neural Information Processing Systems 14 [Neural Information Processing Systems: Natural and Synthetic, 2001


  Loading...