Dan Hendrycks

Thomas Woodside

CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.

[BibT_eX]

[DOI]

CoRR, 2023

Natural Selection Favors AIs over Humans.

[BibT_eX]

[DOI]

CoRR, 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022

A Unified Survey on Anomaly, Novelty, Open-Set, and Out of-Distribution Detection: Solutions and Future Challenges.

[BibT_eX]

[DOI]

Mohammad Hossein Rohban

Mohammad Sabokrou

Trans. Mach. Learn. Res., 2022

PAC Guarantees and Effective Algorithms for Detecting Novel Categories.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2022

Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks.

[BibT_eX]

[DOI]

CoRR, 2022

X-Risk Analysis for AI Research.

[BibT_eX]

[DOI]

CoRR, 2022

Forecasting Future World Events With Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.

[BibT_eX]

[DOI]

Mohammadreza Mostajabi

Jacob Steinhardt

Dawn Song

Proceedings of the International Conference on Machine Learning, 2022

A Spectral View of Randomized Smoothing Under Common Corruptions: Benchmarking and Improving Certified Robustness.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines.

[BibT_eX]

[DOI]

CoRR, 2021

Unsolved Problems in ML Safety.

[BibT_eX]

[DOI]

CoRR, 2021

VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data.

[BibT_eX]

[DOI]

CoRR, 2021

The Trojan Detection Challenge.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2022 Competition Track, 2021

What Would Jiminy Cricket Do? Towards Agents That Behave Morally.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Coding Challenge Competence With APPS.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Mathematical Problem Solving With the MATH Dataset.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review.

[BibT_eX]

[DOI]

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

VisDA-2021 Competition: Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data.

[BibT_eX]

[DOI]

Chandramouli Rajagopalan

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Measuring Massive Multitask Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

Aligning AI With Shared Human Values.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Learning Representations, 2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Natural Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty.

[BibT_eX]

[DOI]

Balaji Lakshminarayanan

Proceedings of the 8th International Conference on Learning Representations, 2020

Pretrained Transformers Improve Out-of-Distribution Robustness.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019

A Benchmark for Anomaly Segmentation.

[BibT_eX]

[DOI]

Steven Basart

Mohammadreza Mostajabi

Jacob Steinhardt

Dawn Song

CoRR, 2019

Testing Robustness Against Unforeseen Adversaries.

[BibT_eX]

[DOI]

CoRR, 2019

Transfer of Adversarial Robustness Between Perturbation Types.

[BibT_eX]

[DOI]

CoRR, 2019

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Pre-Training Can Improve Model Robustness and Uncertainty.

[BibT_eX]

[DOI]

Kimin Lee

Proceedings of the 36th International Conference on Machine Learning, 2019

Deep Anomaly Detection with Outlier Exposure.

[BibT_eX]

[DOI]

Thomas G. Dietterich

Proceedings of the 7th International Conference on Learning Representations, 2019

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.

[BibT_eX]

[DOI]

Thomas G. Dietterich

Proceedings of the 7th International Conference on Learning Representations, 2019

2018

Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Open Category Detection with PAC Guarantees.

[BibT_eX]

[DOI]

Proceedings of the 35th International Conference on Machine Learning, 2018

2017

Early Methods for Detecting Adversarial Images.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

2016

Visible Progress on Adversarial Images and a New Saliency Map.

[BibT_eX]

[DOI]

CoRR, 2016

Generalizing and Improving Weight Initialization.

[BibT_eX]

[DOI]

CoRR, 2016

Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units.

[BibT_eX]

[DOI]