Vladimir Mikulik

According to our database1, Vladimir Mikulik authored at least 13 papers between 2019 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Challenges with unsupervised LLM knowledge discovery.
CoRR, 2023

The Hydra Effect: Emergent Self-repair in Language Model Computations.
CoRR, 2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
CoRR, 2023

Tracr: Compiled Transformers as a Laboratory for Interpretability.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

2022
Teaching language models to support answers with verified quotes.
CoRR, 2022

2021
Scaling Language Models: Methods, Analysis & Insights from Training Gopher.
CoRR, 2021

Alignment of Language Agents.
CoRR, 2021

Causal Analysis of Agent Behavior for AI Safety.
CoRR, 2021

2020
Algorithms for Causal Reasoning in Probability Trees.
CoRR, 2020

Meta-trained agents implement Bayes-optimal agents.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
Neural networks are a priori biased towards Boolean functions with low entropy.
CoRR, 2019

Risks from Learned Optimization in Advanced Machine Learning Systems.
CoRR, 2019


  Loading...