Tengyu Ma

Arvind V. Mahankali

CoRR, 2024

Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

Linguistic Calibration of Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.

[BibT_eX]

[DOI]

CoRR, 2024

Linguistic Calibration of Long-Form Generations.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention.

[BibT_eX]

[DOI]

Arvind V. Mahankali

Tatsunori Hashimoto

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training.

[BibT_eX]

[DOI]

Hong Liu

David Leo Wright Hall

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Large Language Models as Tool Makers.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Chain of Thought Empowers Transformers to Solve Inherently Serial Problems.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time.

[BibT_eX]

[DOI]

CoRR, 2023

The Inductive Bias of Flatness Regularization for Deep Matrix Factorization.

[BibT_eX]

[DOI]

CoRR, 2023

Toward L<sub>∞</sub>-recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields.

[BibT_eX]

[DOI]

CoRR, 2023

Larger language models do in-context learning differently.

[BibT_eX]

[DOI]

CoRR, 2023

Data Selection for Language Models via Importance Resampling.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization.

[BibT_eX]

[DOI]

Kaiyue Wen

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Beyond NTK with Vanilla Gradient Descent: A Mean-Field Analysis of Neural Networks with Polynomial Width, Samples, and Time.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

What is the Inductive Bias of Flatness Regularization? A Study of Deep Matrix Factorization Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

How Sharpness-Aware Minimization Minimizes Sharpness?

[BibT_eX]

[DOI]

Kaiyue Wen

Proceedings of the Eleventh International Conference on Learning Representations, 2023

A theoretical study of inductive biases in contrastive learning.

[BibT_eX]

[DOI]

Jeff Z. HaoChen

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

First Steps Toward Understanding the Extrapolation of Nonlinear Models to Unseen Domains.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

What learning algorithm is in-context learning? Investigations with linear models.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Symbol tuning improves in-context learning in language models.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Toward L_∞Recovery of Nonlinear Functions: A Polynomial Sample Complexity Bound for Gaussian Random Fields.

[BibT_eX]

[DOI]

Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022

On the optimization landscape of tensor decompositions.

[BibT_eX]

[DOI]

Math. Program., 2022

How Does Sharpness-Aware Minimization Minimize Sharpness?

[BibT_eX]

[DOI]

Kaiyue Wen

CoRR, 2022

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2022

Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers.

[BibT_eX]

[DOI]

Yining Chen

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Near-Optimal Algorithms for Autonomous Exploration and Multi-Goal Stochastic Shortest Path.

[BibT_eX]

[DOI]

Haoyuan Cai

Simon S. Du

Proceedings of the International Conference on Machine Learning, 2022

An Explanation of In-context Learning as Implicit Bayesian Inference.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Self-supervised Learning is More Robust to Dataset Imbalance.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective.

[BibT_eX]

[DOI]

Margalit R. Glasgow

Honglin Yuan

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021

Joint System-Wise Optimization for Pipeline Goal-Oriented Dialog System.

[BibT_eX]

[DOI]

CoRR, 2021

Why Do Local Methods Solve Nonconvex Problems?

[BibT_eX]

[DOI]

CoRR, 2021

Entity and Evidence Guided Document-Level Relation Extraction.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Representation Learning for NLP, 2021

Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning.

[BibT_eX]

[DOI]

Sang Michael Xie

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Safe Reinforcement Learning by Imagining the Near Future.

[BibT_eX]

[DOI]

Garrett Thomas

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.

[BibT_eX]

[DOI]

Jiaqi Yang

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Label Noise SGD Provably Prefers Flat Global Minimizers.

[BibT_eX]

[DOI]

Alex Damian

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Variance-reduced First-order Meta-learning for Natural Language Processing Tasks.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Composed Fine-Tuning: Freezing Pre-Trained Denoising Autoencoders for Improved Generalization.

[BibT_eX]

[DOI]

Sang Michael Xie

Proceedings of the 38th International Conference on Machine Learning, 2021

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Optimal Regularization can Mitigate Double Descent.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Fine-Grained Gap-Dependent Bounds for Tabular MDPs via Adaptive Multi-Step Bootstrap.

[BibT_eX]

[DOI]

Haike Xu

Simon S. Du

Proceedings of the Conference on Learning Theory, 2021

Shape Matters: Understanding the Implicit Bias of the Noise Covariance.

[BibT_eX]

[DOI]

Proceedings of the Conference on Learning Theory, 2021

Active Online Learning with Hidden Shifting Domains.

[BibT_eX]

[DOI]

Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

Meta-learning Transferable Representations with a Single Target Domain.

[BibT_eX]

[DOI]

CoRR, 2020

Entity and Evidence Guided Relation Extraction for DocRED.

[BibT_eX]

[DOI]

CoRR, 2020

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK.

[BibT_eX]

[DOI]

Hongyang R. Zhang

CoRR, 2020

Simplifying Models with Unlabeled Output Data.

[BibT_eX]

[DOI]

Sang Michael Xie

CoRR, 2020

Active Online Domain Adaptation.

[BibT_eX]

[DOI]

CoRR, 2020

Robust and On-the-fly Dataset Denoising for Image Classification.

[BibT_eX]

[DOI]

CoRR, 2020

Federated Accelerated Stochastic Gradient Descent.

[BibT_eX]

[DOI]

Honglin Yuan

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

MOPO: Model-based Offline Policy Optimization.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Beyond Lazy Training for Over-parameterized Tensor Decomposition.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Model-based Adversarial Meta-Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Self-training Avoids Using Spurious Features Under Domain Shift.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Individual Calibration with Randomized Forecasting.

[BibT_eX]

[DOI]

Shengjia Zhao

Stefano Ermon

Proceedings of the 37th International Conference on Machine Learning, 2020

The Implicit and Explicit Regularization Effects of Dropout.

[BibT_eX]

[DOI]

Sham M. Kakade

Proceedings of the 37th International Conference on Machine Learning, 2020

Understanding Self-Training for Gradual Domain Adaptation.

[BibT_eX]

[DOI]

Ananya Kumar

Proceedings of the 37th International Conference on Machine Learning, 2020

On the Expressivity of Neural Networks for Deep Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling.

[BibT_eX]

[DOI]

Huazhe Xu

Proceedings of the 8th International Conference on Learning Representations, 2020

Robust and On-the-Fly Dataset Denoising for Image Classification.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Learning Over-Parametrized Two-Layer Neural Networks beyond NTK.

[BibT_eX]

[DOI]

Hongyang R. Zhang

Proceedings of the Conference on Learning Theory, 2020

Why Do Local Methods Solve Nonconvex Problems?

[BibT_eX]

[DOI]

Proceedings of the Beyond the Worst-Case Analysis of Algorithms, 2020

2019

Optimal Design of Process Flexibility for General Production Systems.

[BibT_eX]

[DOI]

Oper. Res., 2019

Bootstrapping the Expressivity with Model-based Planning.

[BibT_eX]

[DOI]

CoRR, 2019

Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin.

[BibT_eX]

[DOI]

CoRR, 2019

A Model-based Approach for Sample-efficient Multi-task Reinforcement Learning.

[BibT_eX]

[DOI]

Nicholas C. Landolfi

Garrett Thomas

CoRR, 2019

Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Verified Uncertainty Calibration.

[BibT_eX]

[DOI]

Ananya Kumar

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Fixup Initialization: Residual Learning Without Normalization.

[BibT_eX]

[DOI]

Hongyi Zhang

Yann N. Dauphin

Proceedings of the 7th International Conference on Learning Representations, 2019

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Learning Representations, 2019

Approximability of Discriminators Implies Diversity in GANs.

[BibT_eX]

[DOI]

Yu Bai

Andrej Risteski

Proceedings of the 7th International Conference on Learning Representations, 2019

On the Performance of Thompson Sampling on Logistic Bandits.

[BibT_eX]

[DOI]

Shi Dong

Benjamin Van Roy

Proceedings of the Conference on Learning Theory, 2019

2018

Linear Algebraic Structure of Word Senses, with Applications to Polysemy.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2018

Gradient Descent Learns Linear Dynamical Systems.

[BibT_eX]

[DOI]

Moritz Hardt

Benjamin Recht

J. Mach. Learn. Res., 2018

On the Margin Theory of Feedforward Neural Networks.

[BibT_eX]

[DOI]

CoRR, 2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees.

[BibT_eX]

[DOI]

CoRR, 2018

Generalization and equilibrium in generative adversarial nets (GANs) (invited talk).

[BibT_eX]

[DOI]

Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Learning One-hidden-layer Neural Networks with Landscape Design.

[BibT_eX]

[DOI]

Proceedings of the 6th International Conference on Learning Representations, 2018

Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations.

[BibT_eX]

[DOI]

Hongyang Zhang

Proceedings of the Conference On Learning Theory, 2018

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

Non-convex Optimization for Machine Learning: Design, Analysis, and Understanding

[BibT_eX]

[DOI]

PhD thesis, 2017

Distributed Stochastic Variance Reduced Gradient Methods by Sampling Extra Data with Replacement.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2017

Algorithmic Regularization in Over-parameterized Matrix Recovery.

[BibT_eX]

[DOI]

Hongyang Zhang

CoRR, 2017

Provable learning of noisy-OR networks.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Finding approximate local minima faster than gradient descent.

[BibT_eX]

[DOI]

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Generalization and Equilibrium in Generative Adversarial Nets (GANs).

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Identity Matters in Deep Learning.

[BibT_eX]

[DOI]

Moritz Hardt

Proceedings of the 5th International Conference on Learning Representations, 2017

A Simple but Tough-to-Beat Baseline for Sentence Embeddings.

[BibT_eX]

[DOI]

Sanjeev Arora

Yingyu Liang

Proceedings of the 5th International Conference on Learning Representations, 2017

On the Ability of Neural Nets to Express Distributions.

[BibT_eX]

[DOI]

Proceedings of the 30th Conference on Learning Theory, 2017

2016

A Latent Variable Model Approach to PMI-based Word Embeddings.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2016

The Simulated Greedy Algorithm for Several Submodular Matroid Secretary Problems.

[BibT_eX]

[DOI]

Bo Tang

Yajun Wang

Theory Comput. Syst., 2016

Finding Approximate Local Minima for Nonconvex Optimization in Linear Time.

[BibT_eX]

[DOI]

CoRR, 2016

Communication lower bounds for statistical estimation problems via a distributed data processing inequality.

[BibT_eX]

[DOI]

Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, 2016

A Non-generative Framework and Convex Relaxations for Unsupervised Learning.

[BibT_eX]

[DOI]

Elad Hazan

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Matrix Completion has No Spurious Local Minimum.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Provable Algorithms for Inference in Topic Models.

[BibT_eX]

[DOI]

Proceedings of the 33nd International Conference on Machine Learning, 2016

Polynomial-Time Tensor Decompositions with Sum-of-Squares.

[BibT_eX]

[DOI]

Jonathan Shi

David Steurer

Proceedings of the IEEE 57th Annual Symposium on Foundations of Computer Science, 2016

2015

Distributed Stochastic Variance Reduced Gradient Methods.

[BibT_eX]

[DOI]

Qihang Lin

CoRR, 2015

Why are deep nets reversible: A simple theory, with implications for training.

[BibT_eX]

[DOI]

Sanjeev Arora

Yingyu Liang

CoRR, 2015

Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.

[BibT_eX]

[DOI]

CoRR, 2015

Sum-of-Squares Lower Bounds for Sparse PCA.

[BibT_eX]

[DOI]

Avi Wigderson

Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015

Online Learning of Eigenvectors.

[BibT_eX]

[DOI]

Dan Garber

Elad Hazan

Proceedings of the 32nd International Conference on Machine Learning, 2015

Simple, Efficient, and Neural Algorithms for Sparse Coding.

[BibT_eX]

[DOI]

Proceedings of The 28th Conference on Learning Theory, 2015

Decomposing Overcomplete 3rd Order Tensors using Sum-of-Squares Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Approximation, 2015

2014

Lower Bound for High-Dimensional Statistical Learning Problem via Direct-Sum Theorem.

[BibT_eX]

[DOI]

Ankit Garg

Huy L. Nguyen

CoRR, 2014

More Algorithms for Provable Dictionary Learning.

[BibT_eX]

[DOI]

CoRR, 2014

On Communication Cost of Distributed Statistical Estimation and Dimensionality.

[BibT_eX]

[DOI]

Ankit Garg

Huy L. Nguyen

Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Provable Bounds for Learning Some Deep Representations.

[BibT_eX]

[DOI]

Proceedings of the 31th International Conference on Machine Learning, 2014

2013

On a conjecture of Butler and Graham.

[BibT_eX]

[DOI]

Xiaoming Sun

Huacheng Yu

Des. Codes Cryptogr., 2013

2011

A New Variation of Hat Guessing Games.

[BibT_eX]

[DOI]