Evaluating Frontier Models for Stealth and Situational Awareness.
CoRR, May, 2025
From Stability to Inconsistency: A Study of Moral Preferences in LLMs.
CoRR, April, 2025
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.
CoRR, 2022
Formal Algorithms for Transformers.
CoRR, 2022
The inductive bias of ReLU networks on orthogonally separable data.
Proceedings of the 9th International Conference on Learning Representations, 2021
Functional vs. parametric equivalence of ReLU networks.
Proceedings of the 8th International Conference on Learning Representations, 2020
Towards Understanding Knowledge Distillation.
Proceedings of the 36th International Conference on Machine Learning, 2019
Distillation-Based Training for Multi-Exit Architectures.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019