Luke Marks

According to our database¹, Luke Marks authored at least 4 papers between 2023 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2024

Informal Safety Guarantees for Simulated Optimizers Through Extrapolation from Partial Simulations.

[BibT_eX]

[DOI]

Luke Marks

CoRR, 2024

Interpreting Learned Feedback Patterns in Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023

Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders.

[BibT_eX]

[DOI]

CoRR, 2023

Luke Marks

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...