AdvBDGen: Adversarially Fortified Prompt-Specific Fuzzy Backdoor Generator Against LLM Alignment.
CoRR, 2024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
CoRR, 2024
Is poisoning a real threat to LLM alignment? Maybe more so than you think.
CoRR, 2024