UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs.
CoRR, 2024
OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems.
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024