Luke Marks

According to our database1, Luke Marks authored at least 4 papers between 2023 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders.
CoRR, 2024

Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations.
CoRR, 2024

Interpreting Learned Feedback Patterns in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.
CoRR, 2023


  Loading...