Zhiyuan Li

David Leo Wright Hall

Percy Liang

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.

[BibT_eX]

[DOI]

CoRR, 2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.

[BibT_eX]

[DOI]

Kaiyue Wen

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

How Sharpness-Aware Minimization Minimizes Sharpness?

[BibT_eX]

[DOI]

Kaiyue Wen

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Bridging Theory and Practice in Deep Learning: Optimization and Generalization

[BibT_eX]

[DOI]

PhD thesis, 2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

[BibT_eX]

[DOI]

Kaiyue Wen

CoRR, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay.

[BibT_eX]

[DOI]

Tianhao Wang

Dingli Yu

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding Gradient Descent on the Edge of Stability in Deep Learning.

[BibT_eX]

[DOI]

Abhishek Panigrahi

Proceedings of the International Conference on Machine Learning, 2022

Robust Training of Neural Networks Using Scale Invariant Architectures.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework.

[BibT_eX]

[DOI]

Tianhao Wang

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

When is particle filtering efficient for planning in partially observed linear dynamical systems?

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs).

[BibT_eX]

[DOI]

Sadhika Malladi

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.

[BibT_eX]

[DOI]

Yuping Luo

Proceedings of the 9th International Conference on Learning Representations, 2021

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

[BibT_eX]

[DOI]

Yi Zhang

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

When is Particle Filtering Efficient for POMDP Sequential Planning?

[BibT_eX]

[DOI]

CoRR, 2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee.

[BibT_eX]

[DOI]

Wei Hu

Dingli Yu

Proceedings of the 8th International Conference on Learning Representations, 2020

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

An Exponential Learning Rate Schedule for Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Enhanced Convolutional Neural Tangent Kernels.

[BibT_eX]

[DOI]

CoRR, 2019

Understanding Generalization of Deep Neural Networks Trained with Noisy Labels.

[BibT_eX]

[DOI]

Wei Hu

Dingli Yu

CoRR, 2019

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On Exact Computation with an Infinitely Wide Neural Net.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 36th International Conference on Machine Learning, 2019

The role of over-parametrization in generalization of neural networks.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization.

[BibT_eX]

[DOI]