Arthur Conmy

According to our database1, Arthur Conmy authored at least 14 papers between 2022 and 2024.

Collaborative distances:

Timeline

2022
2023
2024
0
1
2
3
4
5
6
7
8
9
10
6
2
3
2
1

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Improving Steering Vectors by Targeting Sparse Autoencoder Features.
CoRR, 2024

Applying sparse autoencoders to unlearn knowledge in language models.
CoRR, 2024

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2.
CoRR, 2024

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders.
CoRR, 2024

Interpreting Attention Layer Outputs with Sparse Autoencoders.
CoRR, 2024

Improving Dictionary Learning with Gated Sparse Autoencoders.
CoRR, 2024

Improving Sparse Decomposition of Language Model Activations with Gated Sparse Autoencoders.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Stealing part of a production language model.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Successor Heads: Recurring, Interpretable Attention Heads In The Wild.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Attribution Patching Outperforms Automated Circuit Discovery.
CoRR, 2023

Copy Suppression: Comprehensively Understanding an Attention Head.
CoRR, 2023

Towards Automated Circuit Discovery for Mechanistic Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 Small.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

2022
Stylegan-Induced Data-Driven Regularization for Inverse Problems.
Proceedings of the IEEE International Conference on Acoustics, 2022


  Loading...