Yuxin Cui

Sensors, October, 2024

Second-Order Min-Max Optimization with Lazy Hessians.

[DOI]

Chengchang Liu

CoRR, 2024

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency.

[DOI]

CoRR, 2024

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems.

[DOI]

CoRR, 2024

Towards Black-Box Membership Inference Attack for Diffusion Models.

[DOI]

CoRR, 2024

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problem.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Online Control with Adversarial Disturbance for Continuous-time Linear Systems.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning.

[DOI]

Jing Xu

Proceedings of the Forty-first International Conference on Machine Learning, 2024

A Quadratic Synchronization Rule for Distributed Deep Learning.

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis.

[DOI]

Jing Xu

Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024

2023

Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm.

[DOI]

Peiyuan Zhang

SIAM J. Optim., December, 2023

Two Phases of Scaling Laws for Nearest Neighbor Classifiers.

[DOI]

Pengkun Yang

CoRR, 2023

Near-Optimal Fully First-Order Algorithms for Finding Stationary Points in Bilevel Optimization.

[DOI]

Yaohua Ma

CoRR, 2023

Online Control with Adversarial Disturbance for Continuous-time Linear Systems.

[DOI]

CoRR, 2023

Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization.

[DOI]

Peiyuan Zhang

Jiaye Teng

CoRR, 2023

On Bilevel Optimization without Lower-level Strong Convexity.

[DOI]

Jing Xu

CoRR, 2023

On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Iteratively Learn Diverse Strategies with State Distance Information.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models.

[DOI]

Kaiyue Wen

Jiaye Teng

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

Optimization Theory and Machine Learning Practice: Mind the Gap.

[DOI]

PhD thesis, 2022

Online Policy Optimization for Robust MDP.

[DOI]

CoRR, 2022

Realistic Deep Learning May Not Fit Benignly.

[DOI]

Kaiyue Wen

Jiaye Teng

CoRR, 2022

Minimax in Geodesic Metric Spaces: Sion's Theorem and Algorithms.

[DOI]

Peiyuan Zhang

CoRR, 2022

Detecting Electric Vehicle Battery Failure via Dynamic-VAE.

[DOI]

CoRR, 2022

Efficient Sampling on Riemannian Manifolds via Langevin MCMC.

[DOI]

Xiang Cheng

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective.

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity.

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Understanding the unstable convergence of gradient descent.

[DOI]

Kwangjun Ahn

Proceedings of the International Conference on Machine Learning, 2022

2021

Monitoring, Analyzing, and Modeling for Single Subsidence Basin in Coal Mining Areas Based on SAR Interferometry with L-Band Data.

[DOI]

Sci. Program., 2021

On Convergence of Training Loss Without Reaching Stationary Points.

[DOI]

CoRR, 2021

Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Fast Federated Learning in the Presence of Arbitrary Device Unavailability.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Provably Efficient Algorithms for Multi-Objective Competitive RL.

[DOI]

Proceedings of the 38th International Conference on Machine Learning, 2021

Coping with Label Shift via Distributionally Robust Optimisation.

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Stochastic Optimization with Non-stationary Noise.

[DOI]

CoRR, 2020

On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions.

[DOI]

CoRR, 2020

Why are Adaptive Methods Good for Attention Models?

[DOI]

Sai Praneeth Karimireddy

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions.

[DOI]

Proceedings of the 37th International Conference on Machine Learning, 2020

Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.

[DOI]

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Why ADAM Beats SGD for Attention Models.

[DOI]

Sai Praneeth Karimireddy

CoRR, 2019

Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition.

[DOI]

CoRR, 2019

Quantifying Exposure Bias for Neural Language Generation.

[DOI]

CoRR, 2019

Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization.

[DOI]

Ali Jadbabaie

Proceedings of the 58th IEEE Conference on Decision and Control, 2019

Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE.

[DOI]

Proceedings of the 2019 American Control Conference, 2019

2018

A Probe Towards Understanding GAN and VAE Models.

[DOI]

Lu Mi

Macheng Shen

CoRR, 2018

R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate.

[DOI]

Hongyi Zhang