Francis Rhys Ward

According to our database1, Francis Rhys Ward authored at least 10 papers between 2020 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations.
CoRR, 2024

Evaluating Language Model Character Traits.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

The Reasons that Agents Act: Intention and Instrumental Goals.
Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, 2024

2023
Honesty Is the Best Policy: Defining and Mitigating AI Deception.
CoRR, 2023

Experiments with Detecting and Mitigating AI Deception.
CoRR, 2023

Defining Deception in Structural Causal Games.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

2022
Argumentative Reward Learning: Reasoning About Human Preferences.
CoRR, 2022

A Causal Perspective on AI Deception in Games.
Proceedings of the International Conference on Logic Programming 2022 Workshops co-located with the 38th International Conference on Logic Programming (ICLP 2022), Haifa, Israel, July 31st, 2022

On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios.
Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, 2022

2020
An Assurance Case Pattern for the Interpretability of Machine Learning in Safety-Critical Systems.
Proceedings of the Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops, 2020


  Loading...