2025
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming.
CoRR, January, 2025

2024
Sabotage Evaluations for Frontier Models.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Many-shot Jailbreaking.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Studying Large Language Model Generalization with Influence Functions.
CoRR, 2023

2022
Solving Quantitative Reasoning Problems with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Exploring Length Generalization in Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Path Independent Equilibrium Models Can Better Exploit Test-Time Computation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Sparse capsule networks for informative representation learning in digital pathology.
Proceedings of the Medical Imaging 2022: Digital and Computational Pathology, 2022

2021
Learning to Give Checkable Answers with Prover-Verifier Games.
CoRR, 2021

Learning to Elect.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

2019
Preventing Gradient Attenuation in Lipschitz Constrained Convolutional Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Sorting Out Lipschitz Function Approximation.
Proceedings of the 36th International Conference on Machine Learning, 2019

TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Training Deep Networks With Synthetic Data: Bridging the Reality Gap by Domain Randomization.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018