Safety Misalignment Against Large Language Models.
Proceedings of the 32nd Annual Network and Distributed System Security Symposium, 2025
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts.
Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, February 25, 2025
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models.
CoRR, 2024
Have You Merged My Model? On The Robustness of Large Language Model IP Protection Methods Against Model Merging.
Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, 2024