2025
Effect of the digital transformation of firms in a developing country on their reverse innovation: the complementary roles of multi-contexts.
Int. J. Technol. Manag., 2025

2024
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

Many-shot Jailbreaking.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023

Motion Analysis and Reconstruction of Human Joint Regions for Sparse RGBD Images.
Proceedings of the IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, 2023

Pulmonary Segments Segmentation with Hierarchical Weak Labels.
Proceedings of the 20th IEEE International Symposium on Biomedical Imaging, 2023

Discovering Language Model Behaviors with Model-Written Evaluations.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

In-context Learning and Induction Heads.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
CoRR, 2022

Predictability and Surprise in Large Generative Models.
Proceedings of the FAccT '22: 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21, 2022

2021
A General Language Assistant as a Laboratory for Alignment.
CoRR, 2021