Yunhao Tang

According to our database1, Yunhao Tang authored at least 63 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
An Analysis of Quantile Temporal-Difference Learning.
J. Mach. Learn. Res., 2024

On scalable oversight with weak LLMs judging strong LLMs.
CoRR, 2024

A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning.
CoRR, 2024

Offline Regularised Reinforcement Learning for Large Language Models Alignment.
CoRR, 2024

Understanding the performance gap between online and offline alignment algorithms.
CoRR, 2024

Human Alignment of Large Language Models through Online Preference Optimisation.
CoRR, 2024

Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model.
CoRR, 2024

Off-policy Distributional Q(λ): Distributional RL without Importance Sampling.
CoRR, 2024

A Distributional Analogue to the Successor Representation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Generalized Preference Optimization: A Unified Approach to Offline Alignment.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


Human Alignment of Large Language Models through Online Preference Optimisation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Learning Uncertainty-Aware Temporally-Extended Actions.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Nash Learning from Human Feedback.
CoRR, 2023

Fast Rates for Maximum Entropy Exploration.
Proceedings of the International Conference on Machine Learning, 2023

VA-learning as a more efficient alternative to Q-learning.
Proceedings of the International Conference on Machine Learning, 2023

Towards a better understanding of representation dynamics under TD-learning.
Proceedings of the International Conference on Machine Learning, 2023

DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm.
Proceedings of the International Conference on Machine Learning, 2023

Understanding Self-Predictive Learning for Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2023

The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation.
Proceedings of the International Conference on Machine Learning, 2023

The Edge of Orthogonality: A Simple View of What Makes BYOL Tick.
Proceedings of the International Conference on Machine Learning, 2023

Quantile Credit Assignment.
Proceedings of the International Conference on Machine Learning, 2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice.
Proceedings of the International Conference on Machine Learning, 2023

Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition.
Proceedings of the International Conference on Machine Learning, 2023

2022
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal.
CoRR, 2022

The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

BYOL-Explore: Exploration by Bootstrapped Prediction.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses.
Proceedings of the International Conference on Machine Learning, 2022

Biased Gradient Estimate with Drastic Variance Reduction for Meta Reinforcement Learning.
Proceedings of the International Conference on Machine Learning, 2022

Marginalized Operators for Off-policy Reinforcement Learning.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022

2021
Reinforcement Learning: New Algorithms and An Application for Integer Programming.
PhD thesis, 2021

Unlocking Pixels for Reinforcement Learning via Implicit Attention.
CoRR, 2021

ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning.
CoRR, 2021

Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Taylor Expansion of Discount Factors.
Proceedings of the 38th International Conference on Machine Learning, 2021

Revisiting Peng's Q(λ) for Modern Reinforcement Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021

Guiding Evolutionary Strategies with Off-Policy Actor-Critic.
Proceedings of the AAMAS '21: 20th International Conference on Autonomous Agents and Multiagent Systems, 2021

Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning.
Proceedings of the 24th International Conference on Artificial Intelligence and Statistics, 2021

2020
Online Hyper-parameter Tuning in Off-policy Learning via Evolutionary Strategies.
CoRR, 2020

Discrete Action On-Policy Learning with Action-Value Critic.
CoRR, 2020

Self-Imitation Learning via Generalized Lower Bound Q-learning.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Taylor Expansion Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Reinforcement Learning for Integer Programming: Learning to Cut.
Proceedings of the 37th International Conference on Machine Learning, 2020

Learning to Score Behaviors for Guided Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

Monte-Carlo Tree Search as Regularized Policy Optimization.
Proceedings of the 37th International Conference on Machine Learning, 2020

ES-MAML: Simple Hessian-Free Meta Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

Discrete Action On-Policy Learning with Action-Value Critic.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Variance Reduction for Evolution Strategies via Structured Control Variates.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Practical Nonisotropic Monte Carlo Sampling in High Dimensions via Determinantal Point Processes.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020

Discretizing Continuous Action Space for On-Policy Optimization.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Reinforcement Learning with Chromatic Networks.
CoRR, 2019

Wasserstein Reinforcement Learning.
CoRR, 2019

Structured Monte Carlo Sampling for Nonisotropic Distributions via Determinantal Point Processes.
CoRR, 2019

Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy.
CoRR, 2019

Adaptive Sample-Efficient Blackbox Optimization via ES-active Subspaces.
CoRR, 2019

From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Provably Robust Blackbox Optimization for Reinforcement Learning.
Proceedings of the 3rd Annual Conference on Robot Learning, 2019

Orthogonal Estimation of Wasserstein Distances.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

KAMA-NNs: Low-dimensional Rotation Based Neural Networks.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019

2018
Boosting Trust Region Policy Optimization by Normalizing Flows Policy.
CoRR, 2018

Implicit Policy for Reinforcement Learning.
CoRR, 2018

Exploration by Distributional Reinforcement Learning.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

2017
Variational Deep Q Network.
CoRR, 2017


  Loading...