Libin Zhu

According to our database1, Libin Zhu authored at least 15 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Toward Understanding the Dynamics of Over-parameterized Neural Networks
PhD thesis, 2024

Emergence in non-neural models: grokking modular arithmetic via average gradient outer product.
CoRR, 2024

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Quadratic models for understanding catapult dynamics of neural networks.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Neural tangent kernel at initialization: linear width suffices.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Restricted Strong Convexity of Deep Learning Models with Smooth Activations.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Restricted Strong Convexity of Deep Learning Models with Smooth Activations.
CoRR, 2022

A note on Linear Bottleneck networks and their Transition to Multilinearity.
CoRR, 2022

Quadratic models for understanding neural network dynamics.
CoRR, 2022

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture.
CoRR, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models.
CoRR, 2022

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2020
Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning.
CoRR, 2020

On the linearity of large non-linear models: when and why the tangent kernel is constant.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020


  Loading...