Han Zhong

Affiliations:

Peking University, Center for Data Science, Beijing, China

According to our database¹, Han Zhong authored at least 31 papers between 2020 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2020

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

DPO Meets PPO: Reinforced Token Optimization for RLHF.

[BibT_eX]

[DOI]

CoRR, 2024

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm.

[BibT_eX]

[DOI]

CoRR, 2024

Quantum Non-Identical Mean Estimation: Efficient Algorithms and Fundamental Limits.

[BibT_eX]

[DOI]

Proceedings of the 19th Conference on the Theory of Quantum Computation, 2024

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Robust Offline Reinforcement Learning under Diverse Data Corruption.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Artificial Intelligence and Statistics, 2024

2023

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers?

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2023

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF.

[BibT_eX]

[DOI]

CoRR, 2023

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration.

[BibT_eX]

[DOI]

CoRR, 2023

A Reduction-based Framework for Sequential Decision Making with Delayed Feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes.

[BibT_eX]

[DOI]

Han Zhong

Tong Zhang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022

GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond.

[BibT_eX]

[DOI]

CoRR, 2022

Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Nearly Optimal Policy Optimization with Stable at Any Time Guarantee.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

A Reduction-Based Framework for Conservative Bandits and Reinforcement Learning.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

2021

Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopic Followers?

[BibT_eX]

[DOI]

CoRR, 2021

Optimistic Policy Optimization is Provably Efficient in Non-stationary MDPs.

[BibT_eX]

[DOI]

CoRR, 2021

A Unified Framework for Conservative Exploration.

[BibT_eX]

[DOI]

CoRR, 2021

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020

Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy.

[BibT_eX]

[DOI]

CoRR, 2020

Han Zhong

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...