Large Language Model Safety: A Holistic Survey.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Self-Pluralising Culture Alignment for Large Language Models.
CoRR, 2024
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Identifying Multiple Personalities in Large Language Models with External Evaluation.
CoRR, 2024
LFED: A Literary Fiction Evaluation Dataset for Large Language Models.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Evaluating Large Language Models: A Comprehensive Survey.
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
CS2W: A Chinese Spoken-to-Written Style Conversion Dataset with Multiple Conversion Types.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023