Max Nadeau

According to our database1, Max Nadeau authored at least 6 papers between 2021 and 2023.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback.
Trans. Mach. Learn. Res., 2023

Circuit Breaking: Removing Model Behaviors with Targeted Ablation.
CoRR, 2023

Measurement Tampering Detection Benchmark.
CoRR, 2023

Discovering Variable Binding Circuitry with Desiderata.
CoRR, 2023

2022
Robust Feature-Level Adversaries are Interpretability Tools.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2021
One Thing to Fool them All: Generating Interpretable, Universal, and Physically-Realizable Adversarial Features.
CoRR, 2021


  Loading...