2025

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.

[DOI]

Xiaoshuai Song

Muxi Diao

Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.

[DOI]

CoRR, 2024

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

[DOI]

CoRR, 2024

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.

[DOI]

CoRR, 2024