2024
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks.
CoRR, 2024

2023
Explore, Establish, Exploit: Red Teaming Language Models from Scratch.
CoRR, 2023

2022
Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022