Yuanzhi Li

Orcid: 0009-0004-4418-9308

According to our database1, Yuanzhi Li authored at least 143 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Specifying and Solving Robust Empirical Risk Minimization Problems Using CVXPY.
J. Optim. Theory Appl., September, 2024

Lower Bounds in the Query-with-Sketch Model and a Barrier in Derandomizing BPL.
Electron. Colloquium Comput. Complex., 2024

Mixture of Parrots: Experts improve memorization more than reasoning.
CoRR, 2024

O1 Replication Journey: A Strategic Progress Report - Part 1.
CoRR, 2024

LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks.
CoRR, 2024

Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data.
CoRR, 2024

Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts.
CoRR, 2024

Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems.
CoRR, 2024

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process.
CoRR, 2024

How Does Overparameterization Affect Features?
CoRR, 2024

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.
CoRR, 2024

AgentKit: Flow Engineering with Graphs, not Coding.
CoRR, 2024

VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
CoRR, 2024

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws.
CoRR, 2024

Provably learning a multi-head attention layer.
CoRR, 2024

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

SmartPlay : A Benchmark for LLMs as Intelligent Agents.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Can We Trust the Phone Vendors? Comprehensive Security Measurements on the Android Firmware Ecosystem.
IEEE Trans. Software Eng., July, 2023

Detection of Gas Pipeline Leakage Using Distributed Optical Fiber Sensors: Multi-Physics Analysis of Leakage-Fiber Coupling Mechanism in Soil Environment.
Sensors, 2023

TinyGSM: achieving >80% on GSM8k with small language models.
CoRR, 2023

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine.
CoRR, 2023

Positional Description Matters for Transformers Arithmetic.
CoRR, 2023

Simple Mechanisms for Representing, Indexing and Manipulating Concepts.
CoRR, 2023

Physics of Language Models: Part 3.2, Knowledge Manipulation.
CoRR, 2023

Textbooks Are All You Need II: phi-1.5 technical report.
CoRR, 2023

Efficient RLHF: Reducing the Memory Usage of PPO.
CoRR, 2023

Length Generalization in Arithmetic Transformers.
CoRR, 2023

Textbooks Are All You Need.
CoRR, 2023

Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training.
CoRR, 2023

Toward Understanding Why Adam Converges Faster Than SGD for Transformers.
CoRR, 2023

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning.
CoRR, 2023

Physics of Language Models: Part 1, Context-Free Grammar.
CoRR, 2023

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
CoRR, 2023

Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents.
CoRR, 2023

On the Importance of Contrastive Loss in Multimodal Learning.
CoRR, 2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4.
CoRR, 2023

What Matters In The Structured Pruning of Generative Language Models?
CoRR, 2023

Learning Polynomial Transformations via Generalized Tensor Decompositions.
Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023

SPRING: Studying Papers and Reasoning to play Games.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

How Does Adaptive Optimization Impact Local Neural Network Geometry?
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The probability flow ODE is provably fast.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Benefits of Mixup for Feature Learning.
Proceedings of the International Conference on Machine Learning, 2023

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality.
Proceedings of the International Conference on Machine Learning, 2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding.
Proceedings of the International Conference on Machine Learning, 2023

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
Dissecting adaptive methods in GANs.
CoRR, 2022

Towards Understanding Mixture of Experts in Deep Learning.
CoRR, 2022

Learning Polynomial Transformations.
CoRR, 2022

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Vision Transformers provably learn spatial structure.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Understanding the Mixture-of-Experts Layer in Deep Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning (Very) Simple Generative Models Is Hard.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Large-scale Security Measurements on the Android Firmware Ecosystem.
Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, 2022

Towards understanding how momentum improves generalization in deep learning.
Proceedings of the International Conference on Machine Learning, 2022

LoRA: Low-Rank Adaptation of Large Language Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Complete Policy Regret Bounds for Tallying Bandits.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021
Near-optimal discrete optimization for experimental design: a regret minimization approach.
Math. Program., 2021

On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization.
CoRR, 2021

LoRA: Low-Rank Adaptation of Large Language Models.
CoRR, 2021

A heuristic for statistical seriation.
Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.
Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

When Is Generalizable Reinforcement Learning Tractable?
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.
Proceedings of the 38th International Conference on Machine Learning, 2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability.
Proceedings of the 9th International Conference on Learning Representations, 2021

Mixed Deep Reinforcement Learning-behavior Tree for Intelligent Agents Design.
Proceedings of the 13th International Conference on Agents and Artificial Intelligence, 2021

A Highly Efficient Profiled Power Analysis Attack Based on Power Leakage Fitting.
Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning.
Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

Feature Purification: How Adversarial Training Performs Robust Deep Learning.
Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

A Law of Robustness for Two-Layers Neural Networks.
Proceedings of the Conference on Learning Theory, 2021

2020
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK.
CoRR, 2020

When can Wasserstein GANs minimize Wasserstein Distance?
CoRR, 2020

Backward Feature Correction: How Deep Learning Performs Deep Learning.
CoRR, 2020

Chasing Nested Convex Bodies Nearly Optimally.
Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, 2020

Learning Over-Parametrized Two-Layer Neural Networks beyond NTK.
Proceedings of the Conference on Learning Theory, 2020

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without.
Proceedings of the Conference on Learning Theory, 2020

2019
Competitively chasing convex bodies.
Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Complexity of Highly Parallel Non-Smooth Convex Optimization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Convergence Rate of Training Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Convergence Theory for Deep Learning via Over-Parameterization.
Proceedings of the 36th International Conference on Machine Learning, 2019

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees.
Proceedings of the 7th International Conference on Learning Representations, 2019

Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives.
Proceedings of the Conference on Learning Theory, 2019

Improved Path-length Regret Bounds for Bandits.
Proceedings of the Conference on Learning Theory, 2019

Near-optimal method for highly smooth convex optimization.
Proceedings of the Conference on Learning Theory, 2019

2018
On the ability of gradient descent to learn neural networks
PhD thesis, 2018

Linear Algebraic Structure of Word Senses, with Applications to Polysemy.
Trans. Assoc. Comput. Linguistics, 2018

Geodesic cycles in random graphs.
Discret. Math., 2018

Chasing Nested Convex Bodies Nearly Optimally.
CoRR, 2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees.
CoRR, 2018

An homotopy method for l<sub>p</sub> regression provably beyond self-concordance and in input-sparsity time.
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing.
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

A Nearly Instance Optimal Algorithm for Top-<i>k</i> Ranking under the Multinomial Logit Model.
Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Online Improper Learning with an Approximation Oracle.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

NEON2: Finding Local Minima via First-Order Oracles.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Well-Tempered Lasso.
Proceedings of the 35th International Conference on Machine Learning, 2018

An Alternative View: When Does SGD Escape Local Minima?
Proceedings of the 35th International Conference on Machine Learning, 2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits.
Proceedings of the 35th International Conference on Machine Learning, 2018

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.
Proceedings of the Conference On Learning Theory, 2018

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity.
Proceedings of the Conference On Learning Theory, 2018

Sparsity, variance and curvature in multi-armed bandits.
Proceedings of the Algorithmic Learning Theory, 2018

2017
Does Corporate Governance Matter More for High Financial Slack Firms?
Manag. Sci., 2017

Algorithmic Regularization in Over-parameterized Matrix Recovery.
CoRR, 2017

Follow the Compressed Leader: Faster Algorithms for Matrix Multiplicative Weight Updates.
CoRR, 2017

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model.
CoRR, 2017

Convergence Analysis of Two-layer Neural Networks with ReLU Activation.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations.
Proceedings of the 34th International Conference on Machine Learning, 2017

Near-Optimal Design of Experiments via Regret Minimization.
Proceedings of the 34th International Conference on Machine Learning, 2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU.
Proceedings of the 34th International Conference on Machine Learning, 2017

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition.
Proceedings of the 34th International Conference on Machine Learning, 2017

Much Faster Algorithms for Matrix Scaling.
Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate.
Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

2016
A Latent Variable Model Approach to PMI-based Word Embeddings.
Trans. Assoc. Comput. Linguistics, 2016

Faster Principal Component Regression via Optimal Polynomial Approximation to sgn(x).
CoRR, 2016

Fast Global Convergence of Online PCA.
CoRR, 2016

An optimal algorithm for bandit convex optimization.
CoRR, 2016

Even Faster SVD Decomposition Yet Without Agonizing Pain.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Algorithms and matching lower bounds for approximately-convex optimization.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Recovery guarantee of weighted low-rank approximation via alternating minimization.
Proceedings of the 33nd International Conference on Machine Learning, 2016

2015
Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.
CoRR, 2015

2014
An Automatic News Analysis and Opinion Sharing System for Exchange Rate Analysis.
Proceedings of the 11th IEEE International Conference on e-Business Engineering, 2014

2013
A Theoretical Analysis of NDCG Type Ranking Measures
CoRR, 2013

A Theoretical Analysis of NDCG Type Ranking Measures.
Proceedings of the COLT 2013, 2013

2007
An Empirical Research of Factors Influencing the Decision-Making of Chinese Online Shoppers.
Proceedings of the Integration and Innovation Orient to E-Society, 2007


  Loading...