Understanding Optimization in Deep Learning with Central Flows.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Computational-Statistical Gaps in Gaussian Single-Index Models.
CoRR, 2024
How Transformers Learn Causal Structure with Gradient Descent.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Computational-Statistical Gaps in Gaussian Single-Index Models (Extended Abstract).
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Fine-Tuning Language Models with Just Forward Passes.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Smoothing the Landscape Boosts the Signal for SGD: Optimal Sample Complexity for Learning Single Index Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Self-Stabilization: The Implicit Bias of Gradient Descent at the Edge of Stability.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Neural Networks can Learn Representations with Gradient Descent.
CoRR, 2022
Label Noise SGD Provably Prefers Flat Global Minimizers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021