Zhiyuan Li

Orcid: 0000-0001-8446-0319

Affiliations:
  • Toyota Technological Institute at Chicago (TTIC), IL, USA
  • Stanford University, Department of Computer Science, Stanford, CA, USA
  • Princeton University, Department of Computer Science, Princeton, NJ, USA (PhD 2022)


According to our database1, Zhiyuan Li authored at least 43 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective.
CoRR, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.
CoRR, 2024

Simplicity Bias via Global Convergence of Sharpness Minimization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

The Marginal Value of Momentum for Small Learning Rate SGD.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.
CoRR, 2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models.
Proceedings of the International Conference on Machine Learning, 2023

Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing.
Proceedings of the International Conference on Machine Learning, 2023

How Sharpness-Aware Minimization Minimizes Sharpness?
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Bridging Theory and Practice in Deep Learning: Optimization and Generalization
PhD thesis, 2022

How Does Sharpness-Aware Minimization Minimize Sharpness?
CoRR, 2022

Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Understanding Gradient Descent on the Edge of Stability in Deep Learning.
Proceedings of the International Conference on Machine Learning, 2022

Robust Training of Neural Networks Using Scale Invariant Architectures.
Proceedings of the International Conference on Machine Learning, 2022

What Happens after SGD Reaches Zero Loss? --A Mathematical Framework.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
When is particle filtering efficient for planning in partially observed linear dynamical systems?
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs).
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
When is Particle Filtering Efficient for POMDP Sequential Planning?
CoRR, 2020

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee.
Proceedings of the 8th International Conference on Learning Representations, 2020

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks.
Proceedings of the 8th International Conference on Learning Representations, 2020

An Exponential Learning Rate Schedule for Deep Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Enhanced Convolutional Neural Tangent Kernels.
CoRR, 2019

Understanding Generalization of Deep Neural Networks Trained with Noisy Labels.
CoRR, 2019

Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On Exact Computation with an Infinitely Wide Neural Net.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks.
Proceedings of the 36th International Conference on Machine Learning, 2019

The role of over-parametrization in generalization of neural networks.
Proceedings of the 7th International Conference on Learning Representations, 2019

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks.
CoRR, 2018

Online Improper Learning with an Approximation Oracle.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

2017
Stability of Generalized Two-sided Markets with Transaction Thresholds.
Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, 2017

2016
Fast Convergence of Common Learning Algorithms in Games.
CoRR, 2016

Solving Marginal MAP Problems with NP Oracles and Parity Constraints.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Learning in Games: Robustness of Fast Convergence.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016


  Loading...