Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization.
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025
Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward.
IEEE Trans. Inf. Theory, September, 2024
Faster algorithm and sharper analysis for constrained Markov decision process.
Oper. Res. Lett., 2024
Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Constraint-based multi-agent reinforcement learning for collaborative tasks.
Comput. Animat. Virtual Worlds, 2023
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward.
CoRR, 2022
Deterministic policy gradient: Convergence analysis.
Proceedings of the Uncertainty in Artificial Intelligence, 2022
A Unifying Framework of Off-Policy General Value Function Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Model-Based Offline Meta-Reinforcement Learning with Regularization.
Proceedings of the Tenth International Conference on Learning Representations, 2022
PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method.
Proceedings of the Tenth International Conference on Learning Representations, 2022
A Unified Off-Policy Evaluation Approach for General Value Function.
CoRR, 2021
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality.
Proceedings of the 38th International Conference on Machine Learning, 2021
CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee.
Proceedings of the 38th International Conference on Machine Learning, 2021
Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry.
Proceedings of the 9th International Conference on Learning Representations, 2021
Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis.
CoRR, 2020
Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization.
CoRR, 2020
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms.
CoRR, 2020
Improving Sample Complexity Bounds for Actor-Critic Algorithms.
CoRR, 2020
Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
Reanalysis of Variance Reduced Temporal Difference Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020
Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation.
CoRR, 2019
Finite-Sample Analysis for SARSA with Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
Convergence of SGD in Learning ReLU Models with Separable Data.
CoRR, 2018