Nicholas Goldowsky-Dill

According to our database1, Nicholas Goldowsky-Dill authored at least 6 papers between 2023 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

2023
2024
2025
0
1
2
3
4
5
1
3
1
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
Open Problems in Mechanistic Interpretability.
CoRR, January, 2025

2024
Towards evaluations-based safety cases for AI scheming.
CoRR, 2024

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
CoRR, 2024

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR, 2024

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Localizing Model Behavior with Path Patching.
CoRR, 2023


  Loading...