Owain Evans

According to our database1, Owain Evans authored at least 26 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A->C.
CoRR, 2024

Towards evaluations-based safety cases for AI scheming.
CoRR, 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection.
CoRR, 2024

Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs.
CoRR, 2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
CoRR, 2024

Can Language Models Explain Their Own Classification Behavior?
CoRR, 2024

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A".
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Tell, don't show: Declarative facts influence how LLMs generalize.
CoRR, 2023

Taken out of context: On measuring situational awareness in LLMs.
CoRR, 2023

2022
Teaching Models to Express Their Uncertainty in Words.
Trans. Mach. Learn. Res., 2022

Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

TruthfulQA: Measuring How Models Mimic Human Falsehoods.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Truthful AI: Developing and governing AI that does not lie.
CoRR, 2021

2020
Active Reinforcement Learning: Observing Rewards at a Cost.
CoRR, 2020

2019
Sensory Optimization: Neural Networks as a Model for Understanding and Creating Art.
CoRR, 2019

Generalizing from a few environments in safety-critical reinforcement learning.
CoRR, 2019

2018
Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts.
J. Artif. Intell. Res., 2018

Active Reinforcement Learning with Monte-Carlo Tree Search.
CoRR, 2018

The Malicious Use of Artificial Intelligence: Forecasting, Prevention, and Mitigation.
CoRR, 2018

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention.
Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 2018

2017
When Will AI Exceed Human Performance? Evidence from AI Experts.
CoRR, 2017

Agent-Agnostic Human-in-the-Loop Reinforcement Learning.
CoRR, 2017

2016
Learning the Preferences of Ignorant, Inconsistent Agents.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2009
Help or Hinder: Bayesian Models of Social Goal Inference.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009


  Loading...