2025

Evaluating Frontier Models for Stealth and Situational Awareness.

[DOI]

Mary Phuong

Roland S. Zimmermann

CoRR, May, 2025

From Stability to Inconsistency: A Study of Moral Preferences in LLMs.

[DOI]

Monika Jotautaite

Mary Phuong

Chatrik Singh Mangat

Maria Angelica Martinez

CoRR, April, 2025

2024

Evaluating Frontier Models for Dangerous Capabilities.

[DOI]

CoRR, 2024

2023

Model evaluation for extreme risks.

[DOI]

CoRR, 2023

2022

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals.

[DOI]

CoRR, 2022

Formal Algorithms for Transformers.

[DOI]

Mary Phuong

Marcus Hutter

CoRR, 2022

2021

The inductive bias of ReLU networks on orthogonally separable data.

[DOI]

Mary Phuong

Christoph H. Lampert

Proceedings of the 9th International Conference on Learning Representations, 2021

2020

Functional vs. parametric equivalence of ReLU networks.

[DOI]

Mary Phuong

Christoph H. Lampert

Proceedings of the 8th International Conference on Learning Representations, 2020

2019

Towards Understanding Knowledge Distillation.

[DOI]

Mary Phuong

Christoph Lampert

Proceedings of the 36th International Conference on Machine Learning, 2019

Distillation-Based Training for Multi-Exit Architectures.

[DOI]

Mary Phuong

Christoph Lampert

Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019