Samuel Marks
According to our database1,
Samuel Marks
authored at least 9 papers
between 2023 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability.
CoRR, 2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models.
CoRR, 2024
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
CoRR, 2024
CoRR, 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models.
CoRR, 2024
2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets.
CoRR, 2023