Scaling Laws For Scalable Oversight.

[DOI]

Joshua Engels

,

David D. Baek

,

Subhash Kantamneni

,

Max Tegmark

CoRR, April, 2025

Towards Understanding Distilled Reasoning Models: A Representational Approach.

[DOI]

David D. Baek

,

Max Tegmark

CoRR, March, 2025

Harmonic Loss Trains Interpretable AI Models.

[DOI]

,

,

,

CoRR, February, 2025

The Geometry of Concepts: Sparse Autoencoder Feature Structure.

[DOI]

,

,

,

,

,

CoRR, 2024

Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning.

[DOI]

David D. Baek

,

Yuxiao Li

,

Max Tegmark

CoRR, 2024

GenEFT: Understanding Statics and Dynamics of Model Generalization via Effective Theory.

[DOI]

David D. Baek

,

Ziming Liu

,

Max Tegmark

CoRR, 2024