We stand with Ukraine

We stand with Ukraine

Jared Kaplan

According to our database¹, Jared Kaplan authored at least 35 papers between 2007 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Sabotage Evaluations for Frontier Models.

[BibT_eX]

[DOI]

,

,

Eric Christiansen

,

,

,

,

,

,

,

,

,

Holden Karnofsky

,

,

,

Samuel R. Bowman

,

CoRR, 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.

[BibT_eX]

[DOI]

,

Monte MacDiarmid

,

,

,

,

,

Nicholas Schiefer

,

,

,

,

,

Samuel R. Bowman

,

,

CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.

[BibT_eX]

[DOI]

Aarohi Srivastava

,

Abhinav Rastogi

,

,

Abu Awal Md Shoeb

,

,

,

,

,

,

Adrià Garriga-Alonso

,

Agnieszka Kluska

,

Aitor Lewkowycz

,

,

,

,

,

Alexander W. Kocurek

,

,

,

,

,

,

,

,

,

,

,

Anantharaman S. Iyer

,

Anders Andreassen

,

,

Andrea Santilli

,

Andreas Stuhlmüller

,

,

,

Andrew K. Lampinen

,

,

,

,

,

,

,

Antonio Norelli

,

,

Arash Gholamidavoodi

,

,

,

Arun Kirubarajan

,

Asher Mullokandov

,

Ashish Sabharwal

,

,

,

,

,

B. Ryan Roberts

,

,

,

Bartlomiej Bojanowski

,

Batuhan Özyurt

,

Behnam Hedayatnia

,

Behnam Neyshabur

,

,

,

,

Bill Yuchen Lin

,

,

,

,

,

Catherine Stinson

,

Cedrick Argueta

,

Cèsar Ferri Ramírez

,

,

Charles Rathkopf

,

,

,

,

Chris Callison-Burch

,

,

Christian Voigt

,

Christopher D. Manning

,

Christopher Potts

,

,

Clara E. Rivera

,

,

,

Courtney Ashcraft

,

Cristina Garbacea

,

,

,

,

,

,

,

Daniel Khashabi

,

,

Daniel Moseguí González

,

Danielle Perszyk

,

Danny Hernandez

,

,

Daphne Ippolito

,

,

,

,

,

Debajyoti Datta

,

,

,

,

,

,

,

,

,

,

Dimitri Coelho Mollo

,

,

,

,

Ekaterina Shutova

,

Ekin Dogus Cubuk

,

,

Eleanor Hagerman

,

Elizabeth Barnes

,

Elizabeth Donoway

,

,

Emanuele Rodolà

,

,

,

,

,

,

,

,

Ethan J. Jerzak

,

,

Eunice Engefu Manyasi

,

Evgenii Zheltonozhskii

,

,

,

Fernando Martínez-Plumed

,

Francesca Happé

,

François Chollet

,

,

,

Genta Indra Winata

,

,

Germán Kruszewski

,

Giambattista Parascandolo

,

Giorgio Mariani

,

,

Gonzalo Jaimovitch-López

,

,

,

Hana Galijasevic

,

,

,

Hannaneh Hajishirzi

,

,

,

,

Hinrich Schütze

,

,

,

,

,

,

,

Jack Geissinger

,

Jackson Kernion

,

,

,

Jaime Fernández Fisac

,

,

,

,

,

,

,

Janelle Wingfield

,

,

,

Jascha Sohl-Dickstein

,

,

,

,

Jekaterina Novikova

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Jonathan Batchelder

,

Jonathan Berant

,

,

,

José Hernández-Orallo

,

Joseph Boudeman

,

,

,

Joshua B. Tenenbaum

,

,

,

,

,

,

Karthik Gopalakrishnan

,

Katerina Ignatyeva

,

,

Kaustubh D. Dhole

,

,

,

,

Kristen Chiafullo

,

Ksenia Shkaruta

,

,

,

Kyle Richardson

,

,

,

,

,

,

Lidia Contreras Ochando

,

Louis-Philippe Morency

,

,

,

,

,

,

Luis Oliveros Colón

,

,

Lütfi Kerem Senel

,

,

,

Maartje ter Hoeve

,

,

,

,

,

,

,

María José Ramírez-Quintana

,

,

Mario Giulianelli

,

,

Martin Potthast

,

Matthew L. Leavitt

,

,

Mátyás Schubert

,

Medina Baitemirova

,

,

Melvin McElrath

,

,

,

,

Michael I. Ivanitskiy

,

Michael Starritt

,

,

Michal Swedrowski

,

Michele Bevilacqua

,

Michihiro Yasunaga

,

,

,

,

,

,

,

,

Moin Aminnaseri

,

,

,

Mukund Varma T.

,

,

,

,

Neta Gur-Ari Krakover

,

Nicholas Cameron

,

Nicholas Roberts

,

,

Nicole Martinez

,

,

,

Niklas Muennighoff

,

Nitish Shirish Keskar

,

,

,

,

,

,

,

Omar Elbaghdadi

,

,

,

Pablo Antonio Moreno Casares

,

,

,

,

,

Pegah Alipoormolabashi

,

,

,

,

Peter Eckersley

,

,

,

Piotr Milkowski

,

,

Pouya Pezeshkpour

,

,

,

,

,

,

Rachel Etta Rudolph

,

,

,

,

Raphaël Millière

,

,

,

,

,

Robbe Raymaekers

,

,

,

,

,

,

,

,

,

Ruslan Salakhutdinov

,

,

,

,

,

,

,

Saif M. Mohammad

,

,

,

,

,

Samuel Gruetter

,

Samuel R. Bowman

,

Samuel S. Schoenholz

,

,

,

,

Sarik Ghazarian

,

,

,

Sebastian Bischoff

,

Sebastian Gehrmann

,

Sebastian Schuster

,

Sepideh Sadeghi

,

,

,

Shashank Srivastava

,

,

,

,

Shixiang Shane Gu

,

Shubh Pachchigar

,

Shubham Toshniwal

,

,

Shyamolima (Shammie) Debnath

,

,

Simon Thormeyer

,

,

,

Sneha Priscilla Makini

,

,

,

Sriharsha Hatwar

,

Stanislas Dehaene

,

,

,

Stella Biderman

,

,

,

Steven T. Piantadosi

,

Stuart M. Shieber

,

Summer Misherghi

,

Svetlana Kiritchenko

,

,

,

,

,

,

,

Tatsu Hashimoto

,

,

Théo Desbordes

,

Theodore Rothschild

,

,

,

Tiberius Nkinyili

,

,

,

,

Tobias Gerstenberg

,

,

Trishala Neeraj

,

,

,

,

,

,

Victoria Nyamai

,

,

Vinay V. Ramasesh

,

Vinay Uday Prabhu

,

Vishakh Padmakumar

,

,

,

William Saunders

,

,

,

,

,

,

,

,

Yadollah Yaghoobzadeh

,

,

,

,

,

,

,

,

Yonatan Belinkov

,

,

,

,

,

,

,

,

,

Trans. Mach. Learn. Res., 2023

Evaluating and Mitigating Discrimination in Language Model Decisions.

[BibT_eX]

[DOI]

,

,

,

,

Nicholas Joseph

,

,

,

,

CoRR, 2023

Specific versus General Principles for Constitutional AI.

[BibT_eX]

[DOI]

CoRR, 2023

Studying Large Language Model Generalization with Influence Functions.

[BibT_eX]

[DOI]

Roger B. Grosse

,

,

,

,

,

Amirhossein Tajdini

,

,

,

,

,

,

Kamile Lukosiute

,

,

Nicholas Joseph

,

,

,

Samuel R. Bowman

CoRR, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.

[BibT_eX]

[DOI]

CoRR, 2023

Towards Measuring the Representation of Subjective Global Opinions in Language Models.

[BibT_eX]

[DOI]

,

,

,

Nicholas Schiefer

,

,

,

,

Zac Hatfield-Dodds

,

Danny Hernandez

,

Nicholas Joseph

,

,

,

,

,

,

,

,

CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2023

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

[DOI]

,

,

Kamile Lukosiute

,

,

,

,

,

Catherine Olsson

,

,

Saurav Kadavath

,

,

,

,

,

,

Cameron McKinnon

,

Christopher Olah

,

,

,

,

,

,

Eli Tran-Johnson

,

,

Jackson Kernion

,

,

,

,

,

,

,

Landon Goldberg

,

,

,

Michael Sellitto

,

,

Neerav Kingsland

,

,

Nicholas Joseph

,

,

,

,

,

,

,

,

,

,

Timothy Telleen-Lawton

,

,

,

,

,

Zac Hatfield-Dodds

,

,

Samuel R. Bowman

,

,

,

Danny Hernandez

,

,

,

Nicholas Schiefer

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Scaling Laws from the Data Manifold Dimension.

[BibT_eX]

[DOI]

,

J. Mach. Learn. Res., 2022

Discovering Language Model Behaviors with Model-Written Evaluations.

[BibT_eX]

[DOI]

,

,

Kamile Lukosiute

,

,

,

,

,

Catherine Olsson

,

,

Saurav Kadavath

,

,

,

,

,

,

Cameron McKinnon

,

Christopher Olah

,

,

,

,

,

,

Eli Tran-Johnson

,

,

Jackson Kernion

,

,

,

,

,

,

,

Landon Goldberg

,

,

,

Michael Sellitto

,

,

Neerav Kingsland

,

,

Nicholas Joseph

,

,

,

,

,

,

,

,

,

,

Timothy Telleen-Lawton

,

,

,

,

,

Zac Hatfield-Dodds

,

,

Samuel R. Bowman

,

,

,

Danny Hernandez

,

,

,

Nicholas Schiefer

,

CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.

[BibT_eX]

[DOI]

CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2022

In-context Learning and Induction Heads.

[BibT_eX]

[DOI]

CoRR, 2022

Toy Models of Superposition.

[BibT_eX]

[DOI]

,

,

Catherine Olsson

,

Nicholas Schiefer

,

,

,

Zac Hatfield-Dodds

,

,

,

,

,

,

,

,

Martin Wattenberg

,

Christopher Olah

CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.

[BibT_eX]

[DOI]

CoRR, 2022

Language Models (Mostly) Know What They Know.

[BibT_eX]

[DOI]

CoRR, 2022

Scaling Laws and Interpretability of Learning from Repeated Data.

[BibT_eX]

[DOI]

Danny Hernandez

,

,

,

,

,

,

,

Zac Hatfield-Dodds

,

,

,

,

,

,

Catherine Olsson

,

,

Nicholas Joseph

,

,

CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.

[BibT_eX]

[DOI]

CoRR, 2022

Predictability and Surprise in Large Generative Models.

[BibT_eX]

[DOI]

CoRR, 2022

Predictability and Surprise in Large Generative Models.

[BibT_eX]

[DOI]

Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

2021

A General Language Assistant as a Laboratory for Alignment.

[BibT_eX]

[DOI]

CoRR, 2021

Evaluating Large Language Models Trained on Code.

[BibT_eX]

[DOI]

,

,

,

,

Henrique Pondé de Oliveira Pinto

,

,

,

,

Nicholas Joseph

,

,

,

,

Gretchen Krueger

,

,

,

,

,

,

,

,

,

,

,

Mohammad Bavarian

,

,

Philippe Tillet

,

Felipe Petroski Such

,

,

Matthias Plappert

,

Fotios Chantzis

,

Elizabeth Barnes

,

Ariel Herbert-Voss

,

William Hebgen Guss

,

,

,

,

,

Igor Babuschkin

,

,

,

William Saunders

,

Christopher Hesse

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

,

Wojciech Zaremba

CoRR, 2021

Explaining Neural Scaling Laws.

[BibT_eX]

[DOI]

,

,

,

,

CoRR, 2021

Scaling Laws for Transfer.

[BibT_eX]

[DOI]

Danny Hernandez

,

,

,

CoRR, 2021

Data and Parameter Scaling Laws for Neural Machine Translation.

[BibT_eX]

[DOI]

Mitchell A. Gordon

,

,

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020

Scaling Laws for Autoregressive Generative Modeling.

[BibT_eX]

[DOI]

,

,

,

,

Christopher Hesse

,

,

,

,

Prafulla Dhariwal

,

,

,

,

,

,

,

Daniel M. Ziegler

,

,

,

CoRR, 2020

A Neural Scaling Law from the Dimension of the Data Manifold.

[BibT_eX]

[DOI]

,

CoRR, 2020

Scaling Laws for Neural Language Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

CoRR, 2020

Language Models are Few-Shot Learners.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

2018

An Empirical Model of Large-Batch Training.

[BibT_eX]

[DOI]

,

,

,

OpenAI Dota Team

CoRR, 2018

2007

Explaining Debugging Strategies to End-User Programmers.

[BibT_eX]

[DOI]

Neeraja Subrahmaniyan

,

,

,

,

,

,

Margaret M. Burnett

Proceedings of the 2007 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC 2007), 2007

Loading...