Yuanzhi Li

Mengqi Yuan

Xinming Qian

Sensors, 2023

TinyGSM: achieving >80% on GSM8k with small language models.

[BibT_eX]

[DOI]

CoRR, 2023

Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine.

[BibT_eX]

[DOI]

CoRR, 2023

Positional Description Matters for Transformers Arithmetic.

[BibT_eX]

[DOI]

CoRR, 2023

Simple Mechanisms for Representing, Indexing and Manipulating Concepts.

[BibT_eX]

[DOI]

CoRR, 2023

Physics of Language Models: Part 3.2, Knowledge Manipulation.

[BibT_eX]

[DOI]

CoRR, 2023

Textbooks Are All You Need II: phi-1.5 technical report.

[BibT_eX]

[DOI]

CoRR, 2023

Efficient RLHF: Reducing the Memory Usage of PPO.

[BibT_eX]

[DOI]

CoRR, 2023

Length Generalization in Arithmetic Transformers.

[BibT_eX]

[DOI]

Samy Jelassi

Stéphane d'Ascoli

Carles Domingo-Enrich

Yuhuai Wu

Caio César Teodoro Mendes

François Charton

CoRR, 2023

Textbooks Are All You Need.

[BibT_eX]

[DOI]

Suriya Gunasekar

Yi Zhang

Jyoti Aneja

CoRR, 2023

Why Clean Generalization and Robust Overfitting Both Happen in Adversarial Training.

[BibT_eX]

[DOI]

Binghui Li

CoRR, 2023

Toward Understanding Why Adam Converges Faster Than SGD for Transformers.

[BibT_eX]

[DOI]

Yan Pan

CoRR, 2023

SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Physics of Language Models: Part 1, Context-Free Grammar.

[BibT_eX]

[DOI]

CoRR, 2023

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

[BibT_eX]

[DOI]

Ronen Eldan

CoRR, 2023

Plan, Eliminate, and Track - Language Models are Good Teachers for Embodied Agents.

[BibT_eX]

[DOI]

CoRR, 2023

On the Importance of Contrastive Loss in Multimodal Learning.

[BibT_eX]

[DOI]

Yunwei Ren

CoRR, 2023

Sparks of Artificial General Intelligence: Early experiments with GPT-4.

[BibT_eX]

[DOI]

CoRR, 2023

What Matters In The Structured Pruning of Generative Language Models?

[BibT_eX]

[DOI]

CoRR, 2023

Learning Polynomial Transformations via Generalized Tensor Decompositions.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023

SPRING: Studying Papers and Reasoning to play Games.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Read and Reap the Rewards: Learning to Play Atari with the Help of Instruction Manuals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

How Does Adaptive Optimization Impact Local Neural Network Geometry?

[BibT_eX]

[DOI]

Kaiqi Jiang

Dhruv Malik

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The probability flow ODE is provably fast.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

The Benefits of Mixup for Feature Learning.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding.

[BibT_eX]

[DOI]

Yuchen Li

Proceedings of the International Conference on Machine Learning, 2023

Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

Dissecting adaptive methods in GANs.

[BibT_eX]

[DOI]

CoRR, 2022

Towards Understanding Mixture of Experts in Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2022

Learning Polynomial Transformations.

[BibT_eX]

[DOI]

CoRR, 2022

The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning.

[BibT_eX]

[DOI]

Zixin Wen

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Vision Transformers provably learn spatial structure.

[BibT_eX]

[DOI]

Samy Jelassi

Michael E. Sander

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Understanding the Mixture-of-Experts Layer in Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Learning (Very) Simple Generative Models Is Hard.

[BibT_eX]

[DOI]

Sitan Chen

Jerry Li

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Large-scale Security Measurements on the Android Firmware Ecosystem.

[BibT_eX]

[DOI]

Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, 2022

Towards understanding how momentum improves generalization in deep learning.

[BibT_eX]

[DOI]

Samy Jelassi

Proceedings of the International Conference on Machine Learning, 2022

LoRA: Low-Rank Adaptation of Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Minimax Optimality (Probably) Doesn't Imply Distribution Learning for GANs.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Complete Policy Regret Bounds for Tallying Bandits.

[BibT_eX]

[DOI]

Dhruv Malik

Aarti Singh

Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

2021

Near-optimal discrete optimization for experimental design: a regret minimization approach.

[BibT_eX]

[DOI]

Math. Program., 2021

On the One-sided Convergence of Adam-type Algorithms in Non-convex Non-concave Min-max Optimization.

[BibT_eX]

[DOI]

Zehao Dou

CoRR, 2021

LoRA: Low-Rank Adaptation of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2021

A heuristic for statistical seriation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021

PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections.

[BibT_eX]

[DOI]

Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation, 2021

When Is Generalizable Reinforcement Learning Tractable?

[BibT_eX]

[DOI]

Dhruv Malik

Pradeep Ravikumar

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Local Signal Adaptivity: Provable Feature Learning in Neural Networks Beyond Kernels.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning.

[BibT_eX]

[DOI]

Zixin Wen

Proceedings of the 38th International Conference on Machine Learning, 2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity.

[BibT_eX]

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Mixed Deep Reinforcement Learning-behavior Tree for Intelligent Agents Design.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Agents and Artificial Intelligence, 2021

A Highly Efficient Profiled Power Analysis Attack Based on Power Leakage Fitting.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, 2021

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning.

[BibT_eX]

[DOI]

Ruosong Wang

Lin F. Yang

Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

Feature Purification: How Adversarial Training Performs Robust Deep Learning.

[BibT_eX]

[DOI]

Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

A Law of Robustness for Two-Layers Neural Networks.

[BibT_eX]

[DOI]

Sébastien Bubeck

Dheeraj M. Nagaraj

Proceedings of the Conference on Learning Theory, 2021

2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK.

[BibT_eX]

[DOI]

Hongyang R. Zhang

CoRR, 2020

When can Wasserstein GANs minimize Wasserstein Distance?

[BibT_eX]

[DOI]

Zehao Dou

CoRR, 2020

Backward Feature Correction: How Deep Learning Performs Deep Learning.

[BibT_eX]

[DOI]

CoRR, 2020

Chasing Nested Convex Bodies Nearly Optimally.

[BibT_eX]

[DOI]

Proceedings of the 2020 ACM-SIAM Symposium on Discrete Algorithms, 2020

Learning Over-Parametrized Two-Layer Neural Networks beyond NTK.

[BibT_eX]

[DOI]

Hongyang R. Zhang

Proceedings of the Conference on Learning Theory, 2020

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2020

2019

Competitively chasing convex bodies.

[BibT_eX]

[DOI]

Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks.

[BibT_eX]

[DOI]

Colin Wei

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Complexity of Highly Parallel Non-Smooth Convex Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

On the Convergence Rate of Training Recurrent Neural Networks.

[BibT_eX]

[DOI]

Zhao Song

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Convergence Theory for Deep Learning via Over-Parameterization.

[BibT_eX]

[DOI]

Zhao Song

Proceedings of the 36th International Conference on Machine Learning, 2019

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives.

[BibT_eX]

[DOI]

Alexander V. Gasnikov

Pavel E. Dvurechensky

Eduard Gorbunov

Evgeniya A. Vorontsova

Proceedings of the Conference on Learning Theory, 2019

Improved Path-length Regret Bounds for Bandits.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

Near-optimal method for highly smooth convex optimization.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2019

2018

On the ability of gradient descent to learn neural networks

[BibT_eX]

[DOI]

PhD thesis, 2018

Linear Algebraic Structure of Word Senses, with Applications to Polysemy.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2018

Geodesic cycles in random graphs.

[BibT_eX]

[DOI]

Lingsheng Shi

Discret. Math., 2018

Chasing Nested Convex Bodies Nearly Optimally.

[BibT_eX]

[DOI]

CoRR, 2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees.

[BibT_eX]

[DOI]

CoRR, 2018

An homotopy method for l<sub>p</sub> regression provably beyond self-concordance and in input-sparsity time.

[BibT_eX]

[DOI]

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing.

[BibT_eX]

[DOI]

Ankit Garg

Rafael Mendes de Oliveira

Avi Wigderson

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

A Nearly Instance Optimal Algorithm for Top-<i>k</i> Ranking under the Multinomial Logit Model.

[BibT_eX]

[DOI]

Xi Chen

Jieming Mao

Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 2018

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Online Improper Learning with an Approximation Oracle.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

NEON2: Finding Local Minima via First-Order Oracles.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Well-Tempered Lasso.

[BibT_eX]

[DOI]

Yoram Singer

Proceedings of the 35th International Conference on Machine Learning, 2018

An Alternative View: When Does SGD Escape Local Minima?

[BibT_eX]

[DOI]

Robert Kleinberg

Yang Yuan

Proceedings of the 35th International Conference on Machine Learning, 2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits.

[BibT_eX]

[DOI]

Sébastien Bubeck

Proceedings of the 35th International Conference on Machine Learning, 2018

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.

[BibT_eX]

[DOI]

Hongyang Zhang

Proceedings of the Conference On Learning Theory, 2018

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity.

[BibT_eX]

[DOI]

Proceedings of the Conference On Learning Theory, 2018

Sparsity, variance and curvature in multi-armed bandits.

[BibT_eX]

[DOI]

Sébastien Bubeck

Michael B. Cohen

Proceedings of the Algorithmic Learning Theory, 2018

2017

Does Corporate Governance Matter More for High Financial Slack Firms?

[BibT_eX]

[DOI]

Kose John

Jiaren Pang

Manag. Sci., 2017

Algorithmic Regularization in Over-parameterized Matrix Recovery.

[BibT_eX]

[DOI]

Hongyang Zhang

CoRR, 2017

Follow the Compressed Leader: Faster Algorithms for Matrix Multiplicative Weight Updates.

[BibT_eX]

[DOI]

CoRR, 2017

A Nearly Instance Optimal Algorithm for Top-k Ranking under the Multinomial Logit Model.

[BibT_eX]

[DOI]

Xi Chen

Jieming Mao

CoRR, 2017

Convergence Analysis of Two-layer Neural Networks with ReLU Activation.

[BibT_eX]

[DOI]

Yang Yuan

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Near-Optimal Design of Experiments via Regret Minimization.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Much Faster Algorithms for Matrix Scaling.

[BibT_eX]

[DOI]

Rafael Mendes de Oliveira

Avi Wigderson

Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate.

[BibT_eX]

[DOI]

Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

2016

A Latent Variable Model Approach to PMI-based Word Embeddings.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2016

Faster Principal Component Regression via Optimal Polynomial Approximation to sgn(x).

[BibT_eX]

[DOI]

CoRR, 2016

Fast Global Convergence of Online PCA.

[BibT_eX]

[DOI]

CoRR, 2016

An optimal algorithm for bandit convex optimization.

[BibT_eX]

[DOI]

Elad Hazan

CoRR, 2016

Even Faster SVD Decomposition Yet Without Agonizing Pain.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Algorithms and matching lower bounds for approximately-convex optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Recovery guarantee of weighted low-rank approximation via alternating minimization.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

2015

Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.

[BibT_eX]

[DOI]

CoRR, 2015

2014

An Automatic News Analysis and Opinion Sharing System for Exchange Rate Analysis.

[BibT_eX]

[DOI]

Proceedings of the 11th IEEE International Conference on e-Business Engineering, 2014

2013

A Theoretical Analysis of NDCG Type Ranking Measures

[BibT_eX]

[DOI]

CoRR, 2013

A Theoretical Analysis of NDCG Type Ranking Measures.

[BibT_eX]

[DOI]

Proceedings of the COLT 2013, 2013

2007

An Empirical Research of Factors Influencing the Decision-Making of Chinese Online Shoppers.

[BibT_eX]

[DOI]

Hui Chen