Paul F. Christiano

Affiliations:

OpenAI, USA
University of California, Berkeley, CA, USA (PhD 2017)

According to our database¹, Paul F. Christiano authored at least 33 papers between 2011 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

2012

2014

2016

2018

2020

2022

2024

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Bibliography

2024

International Scientific Report on the Safety of Advanced AI (Interim Report).

[BibT_eX]

[DOI]

CoRR, 2024

Towards a Law of Iterated Expectations for Heuristic Estimators.

[BibT_eX]

[DOI]

CoRR, 2024

Backdoor defense, learnability and obfuscation.

[BibT_eX]

[DOI]

CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Evaluating Language-Model Agents on Realistic Autonomous Tasks.

[BibT_eX]

[DOI]

CoRR, 2023

Model evaluation for extreme risks.

[BibT_eX]

[DOI]

CoRR, 2023

2022

Formalizing the presumption of independence.

[BibT_eX]

[DOI]

Paul F. Christiano

Eric Neyman

Mark Xu

CoRR, 2022

Training language models to follow instructions with human feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021

A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device.

[BibT_eX]

[DOI]

J. ACM, 2021

Recursively Summarizing Books with Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2021

2020

Learning to summarize from human feedback.

[BibT_eX]

[DOI]

CoRR, 2020

Learning to summarize with human feedback.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019

Fine-Tuning Language Models from Human Preferences.

[BibT_eX]

[DOI]

CoRR, 2019

2018

Supervising strong learners by amplifying weak experts.

[BibT_eX]

[DOI]

Paul F. Christiano

Buck Shlegeris

Dario Amodei

CoRR, 2018

Unrestricted Adversarial Examples.

[BibT_eX]

[DOI]

CoRR, 2018

AI safety via debate.

[BibT_eX]

[DOI]

Geoffrey Irving

Paul F. Christiano

Dario Amodei

CoRR, 2018

Certifiable Randomness from a Single Quantum Device.

[BibT_eX]

[DOI]

CoRR, 2018

2017

Manipulation-resistant online learning.

[BibT_eX]

[DOI]

Paul Francis Christiano

PhD thesis, 2017

Deep Reinforcement Learning from Human Preferences.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016

A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models.

[BibT_eX]

[DOI]

CoRR, 2016

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model.

[BibT_eX]

[DOI]

CoRR, 2016

Robust Collaborative Online Learning.

[BibT_eX]

[DOI]

Paul F. Christiano

CoRR, 2016

Concrete Problems in AI Safety.

[BibT_eX]

[DOI]

CoRR, 2016

Theano: A Python framework for fast computation of mathematical expressions.

[BibT_eX]

[DOI]

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre de Brébisson

Samira Ebrahimi Kahou

Pierre-Antoine Manzagol

Christopher Joseph Pal

S. Ramana Subramanyam

CoRR, 2016

Provably manipulation-resistant reputation systems.

[BibT_eX]

[DOI]

Paul F. Christiano

Proceedings of the 29th Conference on Learning Theory, 2016

2015

Reflective Oracles: A Foundation for Classical Game Theory.

[BibT_eX]

[DOI]

Benja Fallenstein

Jessica Taylor

Paul F. Christiano

CoRR, 2015

Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence.

[BibT_eX]

[DOI]

Benja Fallenstein

Jessica Taylor

Paul F. Christiano

Proceedings of the Logic, Rationality, and Interaction - 5th International Workshop, 2015

2014

Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic.

[BibT_eX]

[DOI]

CoRR, 2014

Online local learning via semidefinite programming.

[BibT_eX]

[DOI]

Paul F. Christiano

Proceedings of the Symposium on Theory of Computing, 2014

Open Problem: Online Local Learning.

[BibT_eX]

[DOI]

Paul F. Christiano

Proceedings of The 27th Conference on Learning Theory, 2014

2012

Quantum Money from Hidden Subspaces.

[BibT_eX]

[DOI]

Scott Aaronson

Paul F. Christiano

Electron. Colloquium Comput. Complex., 2012

2011

Lossless Fault-Tolerant Data Structures with Additive Overhead.

[BibT_eX]

[DOI]

Paul F. Christiano

Erik D. Demaine

Shaunak Kishore

Proceedings of the Algorithms and Data Structures - 12th International Symposium, 2011

Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs.

[BibT_eX]

[DOI]

Proceedings of the 43rd ACM Symposium on Theory of Computing, 2011

Paul F. Christiano

Timeline

Legend:

Links

Online presence:

On csauthors.net:

Bibliography

Loading...