2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization.
CoRR, January, 2025

Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback.
CoRR, January, 2025

HyperZero: A Customized End-to-End Auto-Tuning System for Recommendation with Hourly Feedback.
Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, V.1, 2025

2024
Provably Efficient Offline Reinforcement Learning With Trajectory-Wise Reward.
IEEE Trans. Inf. Theory, September, 2024

Faster algorithm and sharper analysis for constrained Markov decision process.
Oper. Res. Lett., 2024

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following.
CoRR, 2024

The Perfect Blend: Redefining RLHF with Mixture of Judges.
CoRR, 2024

2023
Constraint-based multi-agent reinforcement learning for collaborative tasks.
Comput. Animat. Virtual Worlds, 2023

2022
Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward.
CoRR, 2022

Deterministic policy gradient: Convergence analysis.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

A Unifying Framework of Off-Policy General Value Function Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Model-Based Offline Meta-Reinforcement Learning with Regularization.
Proceedings of the Tenth International Conference on Learning Representations, 2022

PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method.
Proceedings of the Tenth International Conference on Learning Representations, 2022

2021
A Unified Off-Policy Evaluation Approach for General Value Function.
CoRR, 2021

Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality.
Proceedings of the 38th International Conference on Machine Learning, 2021

CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee.
Proceedings of the 38th International Conference on Machine Learning, 2021

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry.
Proceedings of the 9th International Conference on Learning Representations, 2021

Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
A Primal Approach to Constrained Policy Optimization: Global Optimality and Finite-Time Analysis.
CoRR, 2020

Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization.
CoRR, 2020

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms.
CoRR, 2020

Improving Sample Complexity Bounds for Actor-Critic Algorithms.
CoRR, 2020

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Reanalysis of Variance Reduced Temporal Difference Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Finite-Sample Analysis for SARSA and Q-Learning with Linear Function Approximation.
CoRR, 2019

Finite-Sample Analysis for SARSA with Linear Function Approximation.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

2018
Convergence of SGD in Learning ReLU Models with Separable Data.
CoRR, 2018