Nora Belrose

According to our database¹, Nora Belrose authored at least 19 papers between 2022 and 2025.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2025

Examining Two Hop Reasoning Through Information Content Scaling.

[BibT_eX]

[DOI]

David Johnston

Nora Belrose

CoRR, February, 2025

Slowing Learning by Erasing Simple Features.

[BibT_eX]

[DOI]

Lucia Quirke

Nora Belrose

CoRR, February, 2025

Converting MLPs into Polynomials in Closed Form.

[BibT_eX]

[DOI]

Nora Belrose

Alice Rigg

CoRR, February, 2025

Partially Rewriting a Transformer in Natural Language.

[BibT_eX]

[DOI]

Gonçalo Paulo

Nora Belrose

CoRR, January, 2025

Transcoders Beat Sparse Autoencoders for Interpretability.

[BibT_eX]

[DOI]

Gonçalo Paulo

Stepan Shabalin

Nora Belrose

CoRR, January, 2025

Estimating the Probability of Sampling a Trained Neural Network at Random.

[BibT_eX]

[DOI]

Adam Scherlis

Nora Belrose

CoRR, January, 2025

Sparse Autoencoders Trained on the Same Data Learn Different Features.

[BibT_eX]

[DOI]

Gonçalo Paulo

Nora Belrose

CoRR, January, 2025

2024

Understanding Gradient Descent through the Training Jacobian.

[BibT_eX]

[DOI]

Nora Belrose

Adam Scherlis

CoRR, 2024

Refusal in LLMs is an Affine Function.

[BibT_eX]

[DOI]

Thomas Marshall

Adam Scherlis

Nora Belrose

CoRR, 2024

Automatically Interpreting Millions of Features in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

Balancing Label Quantity and Quality for Scalable Elicitation.

[BibT_eX]

[DOI]

Alex Mallen

Nora Belrose

CoRR, 2024

Does Transformer Interpretability Transfer to RNNs?

[BibT_eX]

[DOI]

Gonçalo Paulo

Thomas Marshall

Nora Belrose

CoRR, 2024

Neural Networks Learn Statistics of Increasing Complexity.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Eliciting Latent Knowledge from Quirky Language Models.

[BibT_eX]

[DOI]

Alex Mallen

Nora Belrose

CoRR, 2023

Eliciting Latent Predictions from Transformers with the Tuned Lens.

[BibT_eX]

[DOI]

CoRR, 2023

LEACE: Perfect linear concept erasure in closed form.

[BibT_eX]

[DOI]

Nora Belrose

David Schneider-Joseph

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adversarial Policies Beat Superhuman Go AIs.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

2022

imitation: Clean Imitation Learning Implementations.

[BibT_eX]

[DOI]

CoRR, 2022

Adversarial Policies Beat Professional-Level Go AIs.

[BibT_eX]

[DOI]

CoRR, 2022

Nora Belrose

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...