Tian Xu

Orcid: 0000-0001-9409-448X

Affiliations:
  • Nanjing University, National Key Laboratory for Novel Software Technology, China


According to our database1, Tian Xu authored at least 21 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Model gradient: unified model and policy learning in model-based reinforcement learning.
Frontiers Comput. Sci., August, 2024

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity.
CoRR, 2024

A survey on model-based reinforcement learning.
Sci. China Inf. Sci., 2024

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Limited Preference Aided Imitation Learning from Imperfect Demonstrations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Reward-Consistent Dynamics Models are Strongly Generalizable for Offline Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

When is RL better than DPO in RLHF? A Representation and Optimization Perspective.
Proceedings of the Second Tiny Papers Track at ICLR 2024, 2024

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Policy Optimization in RLHF: The Impact of Out-of-preference Data.
CoRR, 2023

ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models.
CoRR, 2023

Theoretical Analysis of Offline Imitation With Supplementary Dataset.
CoRR, 2023

Provably Efficient Adversarial Imitation Learning with Unknown Transitions.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Imitation Learning from Imperfection: Theoretical Justifications and Algorithms.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Error Bounds of Imitating Policies and Environments for Reinforcement Learning.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis.
CoRR, 2022

A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle.
CoRR, 2022

Rethinking ValueDice: Does It Really Improve Performance?
CoRR, 2022

2021
Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions.
CoRR, 2021

Sparsity Prior Regularized Q-learning for Sparse Action Tasks.
CoRR, 2021

2020
Error Bounds of Imitating Policies and Environments.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
On Value Discrepancy of Imitation Learning.
CoRR, 2019


  Loading...