Nan Jiang

Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL, USA
  • University of Michigan Ann Arbor, MI, USA (former)


According to our database1, Nan Jiang authored at least 66 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Model-Free Representation Learning and Exploration in Low-Rank MDPs.
J. Mach. Learn. Res., 2024

Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity.
CoRR, 2024

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning.
CoRR, 2024

RLHF Workflow: From Reward Modeling to Online RLHF.
CoRR, 2024

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference.
CoRR, 2024

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Harnessing Density Ratios for Online Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Mitigating the Alignment Tax of RLHF.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2023
Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF.
CoRR, 2023

Mitigating the Alignment Tax of RLHF.
CoRR, 2023

Future-Dependent Value-Based Off-Policy Evaluation in POMDPs.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adversarial Model for Offline Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reinforcement Learning in Low-rank MDPs with Density Features.
Proceedings of the International Conference on Machine Learning, 2023

The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation.
Proceedings of the International Conference on Machine Learning, 2023

The Role of Coverage in Online Reinforcement Learning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Extended Abstract: Learning in Low-rank MDPs with Density Features.
Proceedings of the 57th Annual Conference on Information Sciences and Systems, 2023

2022
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data.
CoRR, 2022

Offline reinforcement learning under value and density-ratio realizability: The power of gaps.
Proceedings of the Uncertainty in Artificial Intelligence, 2022

Interaction-Grounded Learning with Action-Inclusive Feedback.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Few Expert Queries Suffices for Sample-Efficient RL with Resets and Linear Value Approximation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes.
Proceedings of the International Conference on Machine Learning, 2022

Adversarially Trained Actor Critic for Offline Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Offline Reinforcement Learning with Realizability and Single-policy Concentrability.
Proceedings of the Conference on Learning Theory, 2-5 July 2022, London, UK., 2022

On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes.
CoRR, 2021

Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency.
CoRR, 2021

Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Bellman-consistent Pessimism for Offline Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Batch Value-function Approximation with Only Realizability.
Proceedings of the 38th International Conference on Machine Learning, 2021

On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function.
Proceedings of the Conference on Learning Theory, 2021

Minimax Model Learning.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

Improved Worst-Case Regret Bounds for Randomized Least-Squares Value Iteration.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting.
CoRR, 2020

Q<sup>*</sup> Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison.
CoRR, 2020

Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization.
CoRR, 2020

Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison.
Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Minimax Value Interval for Off-Policy Evaluation and Policy Optimization.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Minimax Weight and Q-Function Learning for Off-Policy Evaluation.
Proceedings of the 37th International Conference on Machine Learning, 2020

From Importance Sampling to Doubly Robust Policy Gradient.
Proceedings of the 37th International Conference on Machine Learning, 2020

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

2019
Minimax Weight and Q-Function Learning for Off-Policy Evaluation.
CoRR, 2019

Provably Efficient Q-Learning with Low Switching Cost.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably efficient RL with Rich Observations via Latent State Decoding.
Proceedings of the 36th International Conference on Machine Learning, 2019

Information-Theoretic Considerations in Batch Reinforcement Learning.
Proceedings of the 36th International Conference on Machine Learning, 2019

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches.
Proceedings of the Conference on Learning Theory, 2019

2018
Model-Based Reinforcement Learning in Contextual Decision Processes.
CoRR, 2018

On Polynomial Time PAC Reinforcement Learning with Rich Observations.
CoRR, 2018

Completing State Representations using Spectral Learning.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

On Oracle-Efficient PAC RL with Rich Observations.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Hierarchical Imitation and Reinforcement Learning.
Proceedings of the 35th International Conference on Machine Learning, 2018

Open Problem: The Dependence of Sample Complexity Lower Bounds on Planning Horizon.
Proceedings of the Conference On Learning Theory, 2018

Markov Decision Processes with Continuous Side Information.
Proceedings of the Algorithmic Learning Theory, 2018

2017
Repeated Inverse Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Contextual Decision Processes with low Bellman rank are PAC-Learnable.
Proceedings of the 34th International Conference on Machine Learning, 2017

2016
On Structural Properties of MDPs that Bound Loss Due to Shallow Planning.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

The Dependence of Effective Planning Horizon on Model Accuracy.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

Improving Predictive State Representations via Gradient Descent.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Abstraction Selection in Model-based Reinforcement Learning.
Proceedings of the 32nd International Conference on Machine Learning, 2015

Low-Rank Spectral Learning with Weighted Loss Functions.
Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, 2015

Spectral Learning of Predictive State Representations with Insufficient Statistics.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015

2014
Improving UCT planning via approximate homomorphisms.
Proceedings of the International conference on Autonomous Agents and Multi-Agent Systems, 2014


  Loading...