2025
Evaluating Frontier Models for Stealth and Situational Awareness.
CoRR, May, 2025

From Stability to Inconsistency: A Study of Moral Preferences in LLMs.
CoRR, April, 2025

2024
Evaluating Frontier Models for Dangerous Capabilities.
CoRR, 2024

2023
Model evaluation for extreme risks.
CoRR, 2023

2022
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.
CoRR, 2022

Formal Algorithms for Transformers.
CoRR, 2022

2021
The inductive bias of ReLU networks on orthogonally separable data.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Functional vs. parametric equivalence of ReLU networks.
Proceedings of the 8th International Conference on Learning Representations, 2020

2019
Towards Understanding Knowledge Distillation.
Proceedings of the 36th International Conference on Machine Learning, 2019

Distillation-Based Training for Multi-Exit Architectures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019