Saurav Kadavath

According to our database1, Saurav Kadavath authored at least 15 papers between 2019 and 2023.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Specific versus General Principles for Constitutional AI.
CoRR, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023


2022
Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

DeepChrome 2.0: Investigating and Improving Architectures, Visualizations, & Experiments.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

2021
Pretraining & Reinforcement Learning: Sharpening the Axe Before Cutting the Tree.
CoRR, 2021

Measuring Coding Challenge Competence With APPS.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Mathematical Problem Solving With the MATH Dataset.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2019
Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019


  Loading...