Henry Sleight

According to our database1, Henry Sleight authored at least 8 papers in 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Best-of-N Jailbreaking.
CoRR, 2024

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.
CoRR, 2024

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats.
CoRR, 2024

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples.
CoRR, 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection.
CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
CoRR, 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data.
CoRR, 2024


  Loading...