Jared Kaplan

According to our database1, Jared Kaplan authored at least 35 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Sabotage Evaluations for Frontier Models.
CoRR, 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Evaluating and Mitigating Discrimination in Language Model Decisions.
CoRR, 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

Studying Large Language Model Generalization with Influence Functions.
CoRR, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR, 2023

Towards Measuring the Representation of Subjective Global Opinions in Language Models.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023


2022
Scaling Laws from the Data Manifold Dimension.
J. Mach. Learn. Res., 2022

Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

In-context Learning and Induction Heads.
CoRR, 2022

Toy Models of Superposition.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Scaling Laws and Interpretability of Learning from Repeated Data.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
CoRR, 2022


2021
A General Language Assistant as a Laboratory for Alignment.
CoRR, 2021

Evaluating Large Language Models Trained on Code.
CoRR, 2021

Explaining Neural Scaling Laws.
CoRR, 2021

Scaling Laws for Transfer.
CoRR, 2021

Data and Parameter Scaling Laws for Neural Machine Translation.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Scaling Laws for Autoregressive Generative Modeling.
CoRR, 2020

A Neural Scaling Law from the Dimension of the Data Manifold.
CoRR, 2020

Scaling Laws for Neural Language Models.
CoRR, 2020


2018
An Empirical Model of Large-Batch Training.
CoRR, 2018

2007
Explaining Debugging Strategies to End-User Programmers.
Proceedings of the 2007 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007), 2007


  Loading...