Henry Sleight

According to our database¹, Henry Sleight authored at least 8 papers in 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Best-of-N Jailbreaking.

[BibT_eX]

[DOI]

CoRR, 2024

Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach.

[BibT_eX]

[DOI]

CoRR, 2024

Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats.

[BibT_eX]

[DOI]

CoRR, 2024

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples.

[BibT_eX]

[DOI]

CoRR, 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection.

[BibT_eX]

[DOI]

CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.

[BibT_eX]

[DOI]

Dylan Hadfield-Menell

Stephen Casper

CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?

[BibT_eX]

[DOI]

CoRR, 2024

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data.

[BibT_eX]

[DOI]

Matthias Gerstgrasser

CoRR, 2024

Henry Sleight

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...