Adam Gleave

Orcid: 0000-0002-3467-528X

According to our database1, Adam Gleave authored at least 28 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Scaling Laws for Data Poisoning in LLMs.
CoRR, 2024

Exploring Scaling Trends in LLM Robustness.
CoRR, 2024

Planning behavior in a recurrent neural network that plays Sokoban.
CoRR, 2024

Can Go AIs be adversarially robust?
CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.
CoRR, 2024

STARC: A General Framework For Quantifying Differences Between Reward Functions.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Exploiting Novel GPT-4 APIs.
CoRR, 2023

On The Fragility of Learned Reward Functions.
CoRR, 2023

Adversarial Policies Beat Superhuman Go AIs.
Proceedings of the International Conference on Machine Learning, 2023

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning.
Proceedings of the International Conference on Machine Learning, 2023

2022
Towards Trustworthy Machine Learning
PhD thesis, 2022

imitation: Clean Imitation Learning Implementations.
CoRR, 2022

Adversarial Policies Beat Professional-Level Go AIs.
CoRR, 2022

Calculus on MDPs: Potential Shaping as a Gradient.
CoRR, 2022

Reducing Exploitability with Population Based Training.
CoRR, 2022

Preprocessing Reward Functions for Interpretability.
CoRR, 2022

A Primer on Maximum Causal Entropy Inverse Reinforcement Learning.
CoRR, 2022

Uncertainty Estimation for Language Reward Models.
CoRR, 2022

2021
Stable-Baselines3: Reliable Reinforcement Learning Implementations.
J. Mach. Learn. Res., 2021

Quantifying Differences in Reward Functions.
Proceedings of the 9th International Conference on Learning Representations, 2021

2020
Understanding Learned Reward Functions.
CoRR, 2020

DERAIL: Diagnostic Environments for Reward And Imitation Learning.
CoRR, 2020

Adversarial Policies: Attacking Deep Reinforcement Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020

2018
Inverse reinforcement learning for video games.
CoRR, 2018

Active Inverse Reward Design.
CoRR, 2018

Multi-task Maximum Entropy Inverse Reinforcement Learning.
CoRR, 2018

2017
Making Compression Algorithms for Unicode Text.
Proceedings of the 2017 Data Compression Conference, 2017

2016
Firmament: Fast, Centralized Cluster Scheduling at Scale.
Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, 2016


  Loading...