Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric.
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models.
CoRR, 2024