Dmitrii Krasheninnikov

Orcid: 0009-0009-4387-8407

According to our database¹, Dmitrii Krasheninnikov authored at least 10 papers between 2019 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

2019

2020

2021

2022

2023

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks.

[BibT_eX]

[DOI]

Madeline Brumley

Joe Kwon

David Krueger

Dmitrii Krasheninnikov

Usman Anwar

CoRR, 2024

Stress-Testing Capability Elicitation With Password-Locked Models.

[BibT_eX]

[DOI]

Ryan Greenblatt

Fabien Roger

Dmitrii Krasheninnikov

David Krueger

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Implicit meta-learning may lead language models to trust more reliable sources.

[BibT_eX]

[DOI]

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Kacper Mlodozeniec

Tegan Maharaj

David Krueger

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Meta- (out-of-context) learning in neural networks.

[BibT_eX]

[DOI]

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Mlodozeniec

David Krueger

CoRR, 2023

Harms from Increasingly Agentic Algorithmic Systems.

[BibT_eX]

[DOI]

Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, 2023

2022

Defining and Characterizing Reward Hacking.

[BibT_eX]

[DOI]

Joar Skalse

Nikolaus H. R. Howe

Dmitrii Krasheninnikov

David Krueger

CoRR, 2022

Defining and Characterizing Reward Gaming.

[BibT_eX]

[DOI]

Joar Skalse

Nikolaus H. R. Howe

Dmitrii Krasheninnikov

David Krueger

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

Combining Reward Information from Multiple Sources.

[BibT_eX]

[DOI]

Dmitrii Krasheninnikov

Rohin Shah

Herke van Hoof

CoRR, 2021

2019

Preferences Implicit in the State of the World.

[BibT_eX]

[DOI]

Rohin Shah

Dmitrii Krasheninnikov

Jordan Alexander

Pieter Abbeel

Anca D. Dragan

Proceedings of the 7th International Conference on Learning Representations, 2019

Dmitrii Krasheninnikov

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...