2025
Prompt-to-Leaderboard.
CoRR, February, 2025

How to Evaluate Reward Models for RLHF.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline.
CoRR, 2024