Paul F. Christiano

Affiliations:
  • OpenAI, USA
  • University of California, Berkeley, CA, USA (PhD 2017)


According to our database1, Paul F. Christiano authored at least 31 papers between 2011 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Towards a Law of Iterated Expectations for Heuristic Estimators.
CoRR, 2024

Backdoor defense, learnability and obfuscation.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

2023
Model evaluation for extreme risks.
CoRR, 2023

2022
Formalizing the presumption of independence.
CoRR, 2022

Training language models to follow instructions with human feedback.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
A Cryptographic Test of Quantumness and Certifiable Randomness from a Single Quantum Device.
J. ACM, 2021

Recursively Summarizing Books with Human Feedback.
CoRR, 2021

2020
Learning to summarize from human feedback.
CoRR, 2020

Learning to summarize with human feedback.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2019
Fine-Tuning Language Models from Human Preferences.
CoRR, 2019

2018
Supervising strong learners by amplifying weak experts.
CoRR, 2018

Unrestricted Adversarial Examples.
CoRR, 2018

AI safety via debate.
CoRR, 2018

Certifiable Randomness from a Single Quantum Device.
CoRR, 2018

2017
Manipulation-resistant online learning.
PhD thesis, 2017

Deep Reinforcement Learning from Human Preferences.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

2016
A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models.
CoRR, 2016

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model.
CoRR, 2016

Robust Collaborative Online Learning.
CoRR, 2016

Concrete Problems in AI Safety.
CoRR, 2016

Theano: A Python framework for fast computation of mathematical expressions.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, 2016

Provably manipulation-resistant reputation systems.
Proceedings of the 29th Conference on Learning Theory, 2016

2015
Reflective Oracles: A Foundation for Classical Game Theory.
CoRR, 2015

Reflective Oracles: A Foundation for Game Theory in Artificial Intelligence.
Proceedings of the Logic, Rationality, and Interaction - 5th International Workshop, 2015

2014
Robust Cooperation in the Prisoner's Dilemma: Program Equilibrium via Provability Logic.
CoRR, 2014

Online local learning via semidefinite programming.
Proceedings of the Symposium on Theory of Computing, 2014

Open Problem: Online Local Learning.
Proceedings of The 27th Conference on Learning Theory, 2014

2012
Quantum Money from Hidden Subspaces.
Electron. Colloquium Comput. Complex., 2012

2011
Lossless Fault-Tolerant Data Structures with Additive Overhead.
Proceedings of the Algorithms and Data Structures - 12th International Symposium, 2011

Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs.
Proceedings of the 43rd ACM Symposium on Theory of Computing, 2011


  Loading...