Nathaniel Li

According to our database¹, Nathaniel Li authored at least 7 papers between 2023 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet.

[BibT_eX]

[DOI]

CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.

[BibT_eX]

[DOI]

Ann-Kathrin Dombrowski

Justin Tienken-Harder

Kallol Krishna Karmakar

Steven Basart

Stephen Fitz

Mindy Levine

Ponnurangam Kumaraguru

CoRR, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023

Representation Engineering: A Top-Down Approach to AI Transparency.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Nathaniel Li

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...