Nora Belrose

According to our database1, Nora Belrose authored at least 8 papers between 2022 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Does Transformer Interpretability Transfer to RNNs?
CoRR, 2024

Neural Networks Learn Statistics of Increasing Complexity.
CoRR, 2024

2023
Eliciting Latent Knowledge from Quirky Language Models.
CoRR, 2023

Eliciting Latent Predictions from Transformers with the Tuned Lens.
CoRR, 2023

LEACE: Perfect linear concept erasure in closed form.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Adversarial Policies Beat Superhuman Go AIs.
Proceedings of the International Conference on Machine Learning, 2023

2022
imitation: Clean Imitation Learning Implementations.
CoRR, 2022

Adversarial Policies Beat Professional-Level Go AIs.
CoRR, 2022


  Loading...