Carson Denison

According to our database1, Carson Denison authored at least 7 papers between 2023 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

Gradient-Based Language Model Red Teaming.
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024

2023
How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy.
J. Artif. Intell. Res., 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR, 2023

Private Ad Modeling with DP-SGD.
Proceedings of the Workshop on Data Mining for Online Advertising (AdKDD 2023) co-located with the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023), 2023


  Loading...