2025
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability.
CoRR, March, 2025

Sparse Autoencoders Do Not Find Canonical Units of Analysis.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion.
Trans. Mach. Learn. Res., 2024

LLM Circuit Analyses Are Consistent Across Training and Scale.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Linear Representations of Sentiment in Large Language Models.
CoRR, 2023

Can Transformers Learn to Solve Problems Recursively?
CoRR, 2023