2025
Scaling Laws For Scalable Oversight.
CoRR, April, 2025

Towards Understanding Distilled Reasoning Models: A Representational Approach.
CoRR, March, 2025

Harmonic Loss Trains Interpretable AI Models.
CoRR, February, 2025

2024
The Geometry of Concepts: Sparse Autoencoder Feature Structure.
CoRR, 2024

Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning.
CoRR, 2024

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory.
CoRR, 2024