2024
Time Series Prediction of Gas Emission in Coal Mining Face Based on Optimized Variational Mode Decomposition and SSA-LSTM.
Sensors, October, 2024
Second-Order Min-Max Optimization with Lazy Hessians.
CoRR, 2024
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency.
CoRR, 2024
Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems.
CoRR, 2024
Towards Black-Box Membership Inference Attack for Diffusion Models.
CoRR, 2024
Functionally Constrained Algorithm Solves Convex Simple Bilevel Problem.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Online Control with Adversarial Disturbance for Continuous-time Linear Systems.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
A Quadratic Synchronization Rule for Distributed Deep Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis.
Proceedings of the Thirty Seventh Annual Conference on Learning Theory, June 30, 2024
2023
Sion's Minimax Theorem in Geodesic Metric Spaces and a Riemannian Extragradient Algorithm.
SIAM J. Optim., December, 2023
Two Phases of Scaling Laws for Nearest Neighbor Classifiers.
CoRR, 2023
Near-Optimal Fully First-Order Algorithms for Finding Stationary Points in Bilevel Optimization.
CoRR, 2023
Online Control with Adversarial Disturbance for Continuous-time Linear Systems.
CoRR, 2023
Lower Generalization Bounds for GD and SGD in Smooth Stochastic Convex Optimization.
CoRR, 2023
On Bilevel Optimization without Lower-level Strong Convexity.
CoRR, 2023
On the Overlooked Pitfalls of Weight Decay and How to Mitigate Them: A Gradient-Norm Perspective.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Iteratively Learn Diverse Strategies with State Distance Information.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models.
Proceedings of the Eleventh International Conference on Learning Representations, 2023
2022
Optimization Theory and Machine Learning Practice: Mind the Gap.
PhD thesis, 2022
Online Policy Optimization for Robust MDP.
CoRR, 2022
Realistic Deep Learning May Not Fit Benignly.
CoRR, 2022
Minimax in Geodesic Metric Spaces: Sion's Theorem and Algorithms.
CoRR, 2022
Detecting Electric Vehicle Battery Failure via Dynamic-VAE.
CoRR, 2022
Efficient Sampling on Riemannian Manifolds via Langevin MCMC.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Neural Network Weights Do Not Converge to Stationary Points: An Invariant Measure Perspective.
Proceedings of the International Conference on Machine Learning, 2022
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity.
Proceedings of the International Conference on Machine Learning, 2022
Understanding the unstable convergence of gradient descent.
Proceedings of the International Conference on Machine Learning, 2022
2021
Monitoring, Analyzing, and Modeling for Single Subsidence Basin in Coal Mining Areas Based on SAR Interferometry with L-Band Data.
Sci. Program., 2021
On Convergence of Training Loss Without Reaching Stationary Points.
CoRR, 2021
Complexity Lower Bounds for Nonconvex-Strongly-Concave Min-Max Optimization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Fast Federated Learning in the Presence of Arbitrary Device Unavailability.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Provably Efficient Algorithms for Multi-Objective Competitive RL.
Proceedings of the 38th International Conference on Machine Learning, 2021
Coping with Label Shift via Distributionally Robust Optimisation.
Proceedings of the 9th International Conference on Learning Representations, 2021
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
2020
Stochastic Optimization with Non-stationary Noise.
CoRR, 2020
On Complexity of Finding Stationary Points of Nonsmooth Nonconvex Functions.
CoRR, 2020
Why are Adaptive Methods Good for Attention Models?
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Complexity of Finding Stationary Points of Nonconvex Nonsmooth Functions.
Proceedings of the 37th International Conference on Machine Learning, 2020
Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.
Proceedings of the 8th International Conference on Learning Representations, 2020
2019
Why ADAM Beats SGD for Attention Models.
CoRR, 2019
Analysis of Gradient Clipping and Adaptive Scaling with a Relaxed Smoothness Condition.
CoRR, 2019
Quantifying Exposure Bias for Neural Language Generation.
CoRR, 2019
Acceleration in First Order Quasi-strongly Convex Optimization by ODE Discretization.
Proceedings of the 58th IEEE Conference on Decision and Control, 2019
Achieving Acceleration in Distributed Optimization via Direct Discretization of the Heavy-Ball ODE.
Proceedings of the 2019 American Control Conference, 2019
2018
A Probe Towards Understanding GAN and VAE Models.
CoRR, 2018
R-SPIDER: A Fast Riemannian Stochastic Optimization Algorithm with Curvature Independent Rate.
CoRR, 2018
Direct Runge-Kutta Discretization Achieves Acceleration.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018