Dan Hendrycks

According to our database1, Dan Hendrycks authored at least 62 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
LLM-PBE: Assessing Data Privacy in Large Language Models.
Proc. VLDB Endow., July, 2024

AI deception: A survey of examples, risks, and potential solutions.
Patterns, 2024

Introduction to AI Safety, Ethics, and Society.
CoRR, 2024

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents.
CoRR, 2024

Tamper-Resistant Safeguards for Open-Weight LLMs.
CoRR, 2024

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
CoRR, 2024

Improving Alignment and Robustness with Circuit Breakers.
CoRR, 2024

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning.
CoRR, 2024

Uncovering Latent Human Wellbeing in Language Model Embeddings.
CoRR, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal.
Proceedings of the Forty-first International Conference on Machine Learning, 2024


Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Trans. Mach. Learn. Res., 2023

Identifying and Mitigating the Security Risks of Generative AI.
Found. Trends Priv. Secur., 2023

Can LLMs Follow Simple Rules?
CoRR, 2023

Representation Engineering: A Top-Down Approach to AI Transparency.
CoRR, 2023

Identifying and Mitigating the Security Risks of Generative AI.
CoRR, 2023

An Overview of Catastrophic AI Risks.
CoRR, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark.
CoRR, 2023

Natural Selection Favors AIs over Humans.
CoRR, 2023

DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark.
Proceedings of the International Conference on Machine Learning, 2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

2022
A Unified Survey on Anomaly, Novelty, Open-Set, and Out of-Distribution Detection: Solutions and Future Challenges.
Trans. Mach. Learn. Res., 2022

PAC Guarantees and Effective Algorithms for Detecting Novel Categories.
J. Mach. Learn. Res., 2022

Actionable Guidance for High-Consequence AI Risk Management: Towards Standards Addressing AI Catastrophic Risks.
CoRR, 2022

X-Risk Analysis for AI Research.
CoRR, 2022

Forecasting Future World Events With Neural Networks.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

OpenOOD: Benchmarking Generalized Out-of-Distribution Detection.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Scaling Out-of-Distribution Detection for Real-World Settings.
Proceedings of the International Conference on Machine Learning, 2022

A Spectral View of Randomized Smoothing Under Common Corruptions: Benchmarking and Improving Certified Robustness.
Proceedings of the Computer Vision - ECCV 2022, 2022

PixMix: Dreamlike Pictures Comprehensively Improve Safety Measures.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Certified Adversarial Defenses Meet Out-of-Distribution Corruptions: Benchmarking Robustness and Simple Baselines.
CoRR, 2021

Unsolved Problems in ML Safety.
CoRR, 2021

VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data.
CoRR, 2021


What Would Jiminy Cricket Do? Towards Agents That Behave Morally.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Coding Challenge Competence With APPS.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Measuring Mathematical Problem Solving With the MATH Dataset.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

VisDA-2021 Competition: Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data.
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Measuring Massive Multitask Language Understanding.
Proceedings of the 9th International Conference on Learning Representations, 2021

Aligning AI With Shared Human Values.
Proceedings of the 9th International Conference on Learning Representations, 2021

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Natural Adversarial Examples.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty.
Proceedings of the 8th International Conference on Learning Representations, 2020

Pretrained Transformers Improve Out-of-Distribution Robustness.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
A Benchmark for Anomaly Segmentation.
CoRR, 2019

Testing Robustness Against Unforeseen Adversaries.
CoRR, 2019

Transfer of Adversarial Robustness Between Perturbation Types.
CoRR, 2019

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Using Pre-Training Can Improve Model Robustness and Uncertainty.
Proceedings of the 36th International Conference on Machine Learning, 2019

Deep Anomaly Detection with Outlier Exposure.
Proceedings of the 7th International Conference on Learning Representations, 2019

Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.
Proceedings of the 7th International Conference on Learning Representations, 2019

2018
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Open Category Detection with PAC Guarantees.
Proceedings of the 35th International Conference on Machine Learning, 2018

2017
Early Methods for Detecting Adversarial Images.
Proceedings of the 5th International Conference on Learning Representations, 2017

A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
Visible Progress on Adversarial Images and a New Saliency Map.
CoRR, 2016

Generalizing and Improving Weight Initialization.
CoRR, 2016

Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units.
CoRR, 2016


  Loading...