Xander Davies

According to our database1, Xander Davies authored at least 6 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
CoRR, 2024

2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation.
CoRR, 2023

Discovering Variable Binding Circuitry with Desiderata.
CoRR, 2023

Unifying Grokking and Double Descent.
CoRR, 2023

Sparse Distributed Memory is a Continual Learner.
Proceedings of the Eleventh International Conference on Learning Representations, 2023


  Loading...