Nathaniel Li
According to our database1,
Nathaniel Li
authored at least 7 papers
between 2023 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.
CoRR, 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.
Proceedings of the International Conference on Machine Learning, 2023