2025
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models.
CoRR, 2024

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
CoRR, 2024

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery.
CoRR, 2024