Wei Xiong

Affiliations:
  • University of Illinois Urbana-Champaign, Department of Computer Science, Urbana, IL, USA
  • Hong Kong University of Science and Technology, Hong Kong (former)


According to our database1, Wei Xiong authored at least 28 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
RLHF Workflow: From Reward Modeling to Online RLHF.
CoRR, 2024

DPO Meets PPO: Reinforced Token Optimization for RLHF.
CoRR, 2024

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference.
CoRR, 2024

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Mitigating the Alignment Tax of RLHF.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization.
Proceedings of the Computer Vision - ECCV 2024, 2024

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Reward Teaching for Federated Multiarmed Bandits.
IEEE Trans. Signal Process., 2023

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.
Trans. Mach. Learn. Res., 2023

Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF.
CoRR, 2023

Mitigating the Alignment Tax of RLHF.
CoRR, 2023

One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration.
CoRR, 2023

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment.
CoRR, 2023

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Reward Teaching for Federated Multi-armed Bandits.
Proceedings of the IEEE International Symposium on Information Theory, 2023

Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes.
Proceedings of the International Conference on Machine Learning, 2023

Provably Efficient Offline Reinforcement Learning with Perturbed Data Sources.
Proceedings of the International Conference on Machine Learning, 2023

Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond.
CoRR, 2022

Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets.
Proceedings of the International Conference on Machine Learning, 2022

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games.
Proceedings of the International Conference on Machine Learning, 2022

2021
Distributional Reinforcement Learning for Multi-Dimensional Reward Functions.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

(Almost) Free Incentivized Exploration from Decentralized Learning Agents.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Heterogeneous Multi-player Multi-armed Bandits: Closing the Gap and Generalization.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2020
PMGT-VR: A decentralized proximal-gradient algorithmic framework with variance reduction.
CoRR, 2020

Decentralized Multi-player Multi-armed Bandits with No Collision Information.
Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics, 2020


  Loading...