Stefan Heimersheim

According to our database1, Stefan Heimersheim authored at least 9 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of five.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs.
CoRR, 2024

Evolution of SAE Features Across Layers in LLMs.
CoRR, 2024

Characterizing stable regions in the residual stream of LLMs.
CoRR, 2024

Evaluating Synthetic Activations composed of SAE Latents in GPT-2.
CoRR, 2024

You can remove GPT2's LayerNorm by fine-tuning.
CoRR, 2024

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
CoRR, 2024

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR, 2024

How to use and interpret activation patching.
CoRR, 2024

2023
Towards Automated Circuit Discovery for Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023


  Loading...