Matthew Rahtz

According to our database1, Matthew Rahtz authored at least 10 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI.
CoRR, 2024

Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

2023
The Hydra Effect: Emergent Self-repair in Language Model Computations.
CoRR, 2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Safe Deep RL in 3D Environments using Human Feedback.
CoRR, 2022

2019
An Extensible Interactive Interface for Agent Design.
CoRR, 2019

2017
Truth in the 'killer robots' angle?
AI Matters, 2017

2016
Ensembl 2016.
Nucleic Acids Res., 2016


  Loading...