Zeyuan Allen Zhu

Orcid: 0000-0003-3002-089X

Affiliations:
  • MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, USA


According to our database1, Zeyuan Allen Zhu authored at least 92 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems.
CoRR, 2024

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process.
CoRR, 2024

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws.
CoRR, 2024

Reverse Training to Nurse the Reversal Curse.
CoRR, 2024

Physics of Language Models: Part 3.1, Knowledge Storage and Extraction.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
SALSA VERDE: a machine learning attack on Learning with Errors with sparse small secrets.
IACR Cryptol. ePrint Arch., 2023

Physics of Language Models: Part 3.2, Knowledge Manipulation.
CoRR, 2023

Physics of Language Models: Part 1, Context-Free Grammar.
CoRR, 2023

SALSA VERDE: a machine learning attack on LWE with sparse small secrets.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Forward Super-Resolution: How Can GANs Learn Hierarchical Generative Models for Real-World Distributions.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning.
Proceedings of the Thirty Sixth Annual Conference on Learning Theory, 2023

2022
LoRA: Low-Rank Adaptation of Large Language Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
Near-optimal discrete optimization for experimental design: a regret minimization approach.
Math. Program., 2021

LoRA: Low-Rank Adaptation of Large Language Models.
CoRR, 2021

Byzantine-Resilient Non-Convex Stochastic Gradient Descent.
Proceedings of the 9th International Conference on Learning Representations, 2021

Feature Purification: How Adversarial Training Performs Robust Deep Learning.
Proceedings of the 62nd IEEE Annual Symposium on Foundations of Computer Science, 2021

2020
Backward Feature Correction: How Deep Learning Performs Deep Learning.
CoRR, 2020

2019
Nearly linear-time packing and covering LP solvers - Achieving width-independence and -convergence.
Math. Program., 2019

On the Convergence Rate of Training Recurrent Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

What Can ResNet Learn Efficiently, Going Beyond Kernels?
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

A Convergence Theory for Deep Learning via Over-Parameterization.
Proceedings of the 36th International Conference on Machine Learning, 2019

2018
How To Make the Gradients Small Stochastically.
CoRR, 2018

Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing.
Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018

Is Q-Learning Provably Efficient?
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

The Lingering of Gradients: How to Reuse Gradients Over Time.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

NEON2: Finding Local Minima via First-Order Oracles.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Natasha 2: Faster Non-Convex Optimization Than SGD.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

How To Make the Gradients Small Stochastically: Even Faster Convex and Nonconvex SGD.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Byzantine Stochastic Gradient Descent.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits.
Proceedings of the 35th International Conference on Machine Learning, 2018

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods.
J. Mach. Learn. Res., 2017

Follow the Compressed Leader: Faster Algorithms for Matrix Multiplicative Weight Updates.
CoRR, 2017

Finding approximate local minima faster than gradient descent.
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, 2017

Linear Convergence of a Frank-Wolfe Type Algorithm over Trace-Norm Balls.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent.
Proceedings of the 8th Innovations in Theoretical Computer Science Conference, 2017

Near-Optimal Design of Experiments via Regret Minimization.
Proceedings of the 34th International Conference on Machine Learning, 2017

Follow the Compressed Leader: Faster Online Learning of Eigenvectors and Faster MMWU.
Proceedings of the 34th International Conference on Machine Learning, 2017

Faster Principal Component Regression and Stable Matrix Chebyshev Approximation.
Proceedings of the 34th International Conference on Machine Learning, 2017

Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition.
Proceedings of the 34th International Conference on Machine Learning, 2017

Natasha: Faster Non-Convex Stochastic Optimization via Strongly Non-Convex Parameter.
Proceedings of the 34th International Conference on Machine Learning, 2017

Much Faster Algorithms for Matrix Scaling.
Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

First Efficient Convergence for Streaming k-PCA: A Global, Gap-Free, and Near-Optimal Rate.
Proceedings of the 58th IEEE Annual Symposium on Foundations of Computer Science, 2017

2016
Restricted Isometry Property for General p-Norms.
IEEE Trans. Inf. Theory, 2016

Reconstructing Markov processes from independent and anonymous experiments.
Discret. Appl. Math., 2016

Faster Principal Component Regression via Optimal Polynomial Approximation to sgn(x).
CoRR, 2016

Fast Global Convergence of Online PCA.
CoRR, 2016

Katyusha: Accelerated Variance Reduction for Faster SGD.
CoRR, 2016

Finding Approximate Local Minima for Nonconvex Optimization in Linear Time.
CoRR, 2016

Expanders via Local Edge Flips.
Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016

Using Optimization to Obtain a Width-Independent, Parallel, Simpler, and Faster Positive SDP Solver.
Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Even Faster SVD Decomposition Yet Without Agonizing Pain.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Optimal Black-Box Reductions Between Optimization Objectives.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Variance Reduction for Faster Non-Convex Optimization.
Proceedings of the 33nd International Conference on Machine Learning, 2016

Optimization Algorithms for Faster Computational Geometry.
Proceedings of the 43rd International Colloquium on Automata, Languages, and Programming, 2016

2015
Shorter arithmetization of nondeterministic computations.
Theor. Comput. Sci., 2015

Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling.
CoRR, 2015

UniVR: A Universal Variance Reduction Framework for Proximal Stochastic Gradient Method.
CoRR, 2015

Nearly-Linear Time Positive LP Solver with Faster Convergence Rate.
Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, 2015

Spectral Sparsification and Regret Minimization Beyond Matrix Multiplicative Updates.
Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, 2015

Using Optimization to Break the Epsilon Barrier: A Faster and Simpler Width-Independent Algorithm for Solving Positive Linear Programs in Parallel.
Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, 2015

2014
Nearly-Linear Time Packing and Covering LP Solver with Faster Convergence Rate Than $O(1/\varepsilon^2)$.
CoRR, 2014

A Novel, Simple Interpretation of Nesterov's Accelerated Method as a Combination of Gradient and Mirror Descent.
CoRR, 2014

Using Optimization to Find Maximum Inscribed Balls and Minimum Enclosing Balls.
CoRR, 2014

Johnson-Lindenstrauss Compression with Neuroscience-Based Constraints.
CoRR, 2014

Knightian Robustness of the Vickrey Mechanism.
CoRR, 2014

Knightian Robustness of Single-Parameter Domains.
CoRR, 2014

Knightian Analysis of the VCG Mechanism in Unrestricted Combinatorial Auctions.
CoRR, 2014

Knightian Robustness from Regret Minimization.
CoRR, 2014

Bridging Utility Maximization and Regret Minimization.
CoRR, 2014

Flow-Based Algorithms for Local Graph Clustering.
Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, 2014

Knightian self uncertainty in the vcg mechanism for unrestricted combinatorial auctions.
Proceedings of the ACM Conference on Economics and Computation, 2014

2013
A simple, combinatorial algorithm for solving SDD systems in nearly-linear time.
Proceedings of the Symposium on Theory of Computing Conference, 2013

A Local Algorithm for Finding Well-Connected Clusters.
Proceedings of the 30th International Conference on Machine Learning, 2013

2012
Randomized accuracy-aware program transformations for efficient approximate computations.
Proceedings of the 39th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 2012

Mechanism design with approximate valuations.
Proceedings of the Innovations in Theoretical Computer Science 2012, 2012

2011
Knightian Auctions
CoRR, 2011

Optimal Pricing in Social Networks with Incomplete Information.
Proceedings of the Internet and Network Economics - 7th International Workshop, 2011

2010
Survey & Experiment: Towards the Learning Accuracy
CoRR, 2010

Pricing in Social Networks: Equilibrium and Revenue Maximization
CoRR, 2010

A novel click model and its applications to online advertising.
Proceedings of the Third International Conference on Web Search and Web Data Mining, 2010

Asymptotically optimal strategy-proof mechanisms for two-facility games.
Proceedings of the Proceedings 11th ACM Conference on Electronic Commerce (EC-2010), 2010

2009
Inverse Time Dependency in Convex Regularized Learning.
Proceedings of the ICDM 2009, 2009

P-packSVM: Parallel Primal grAdient desCent Kernel SVM.
Proceedings of the ICDM 2009, 2009

A general magnitude-preserving boosting algorithm for search ranking.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

To divide and conquer search ranking by learning query difficulty.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009


  Loading...