2025
Parameterized Synthetic Text Generation with SimpleStories.
CoRR, April, 2025

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition.
CoRR, January, 2025

2024
Towards evaluations-based safety cases for AI scheming.
CoRR, 2024

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks.
CoRR, 2024

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability.
CoRR, 2024

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2022
Interpreting Neural Networks through the Polytope Lens.
CoRR, 2022

2020
Construction and Elicitation of a Black Box Model in the Game of Bridge.
CoRR, 2020

2013
Structural learning.
Scholarpedia, 2013