CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024