Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs.
CoRR, 2024
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models - The Story Goes On.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning.
CoRR, 2024
SCC: an efficient deep reinforcement learning agent mastering the game of StarCraft II.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 38th International Conference on Machine Learning, 2021