Joar Skalse

According to our database1, Joar Skalse authored at least 18 papers between 2019 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret.
CoRR, 2024

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems.
CoRR, 2024

On the Expressivity of Objective-Specification Formalisms in Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Goodhart's Law in Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks.
Proceedings of the Uncertainty in Artificial Intelligence, 2023

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning.
Proceedings of the International Conference on Machine Learning, 2023

Misspecification in Inverse Reinforcement Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Defining and Characterizing Reward Hacking.
CoRR, 2022

Defining and Characterizing Reward Gaming.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Lexicographic Multi-Objective Reinforcement Learning.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

2021
Is SGD a Bayesian sampler? Well, almost.
J. Mach. Learn. Res., 2021

A General Counterexample to Any Decision Theory and Some Responses.
CoRR, 2021

Reinforcement Learning in Newcomblike Environments.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Safety Properties of Inductive Logic Programming.
Proceedings of the Workshop on Artificial Intelligence Safety 2021 (SafeAI 2021) co-located with the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI 2021), 2021

2019
Neural networks are a priori biased towards Boolean functions with low entropy.
CoRR, 2019

Risks from Learned Optimization in Advanced Machine Learning Systems.
CoRR, 2019


  Loading...