Simon Lermen

According to our database1, Simon Lermen authored at least 5 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of five.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents.
CoRR, 2024

2023
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability.
CoRR, 2023

BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B.
CoRR, 2023

LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B.
CoRR, 2023

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios.
CoRR, 2023


  Loading...