Tian Xu

Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023

Policy Optimization in RLHF: The Impact of Out-of-preference Data.

[BibT_eX]

[DOI]

CoRR, 2023

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Theoretical Analysis of Offline Imitation With Supplementary Dataset.

[BibT_eX]

[DOI]

CoRR, 2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions.

[BibT_eX]

[DOI]

Proceedings of the Uncertainty in Artificial Intelligence, 2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022

Error Bounds of Imitating Policies and Environments for Reinforcement Learning.

[BibT_eX]

[DOI]

IEEE Trans. Pattern Anal. Mach. Intell., 2022

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis.

[BibT_eX]

[DOI]

CoRR, 2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle.

[BibT_eX]

[DOI]

CoRR, 2022

Rethinking ValueDice: Does It Really Improve Performance?

[BibT_eX]

[DOI]

CoRR, 2022

2021

Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions.

[BibT_eX]

[DOI]

CoRR, 2021

Sparsity Prior Regularized Q-learning for Sparse Action Tasks.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Error Bounds of Imitating Policies and Environments.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

On Value Discrepancy of Imitation Learning.

[BibT_eX]

[DOI]