Andy Zou
According to our database1,
Andy Zou
authored at least 22 papers
between 2021 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
On csauthors.net:
Bibliography
2024
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Proceedings of the Forty-first International Conference on Machine Learning, 2024
2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
Trans. Mach. Learn. Res., 2023
CoRR, 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.
CoRR, 2023
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.
Proceedings of the International Conference on Machine Learning, 2023
2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
2021
Proceedings of the NeurIPS 2022 Competition Track, 2021
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021
Proceedings of the 9th International Conference on Learning Representations, 2021