Ethan Perez

According to our database1, Ethan Perez authored at least 46 papers between 2016 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Learning from Natural Language Feedback.
Trans. Mach. Learn. Res., 2024

Language Models Learn to Mislead Humans via RLHF.
CoRR, 2024

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs.
CoRR, 2024

When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?
CoRR, 2024

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models.
CoRR, 2024

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought.
CoRR, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training.
CoRR, 2024

Debating with More Persuasive LLMs Leads to More Truthful Answers.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Understanding Sycophancy in Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

2023
Inverse Scaling: When Bigger Isn't Better.
Trans. Mach. Learn. Res., 2023

Towards Evaluating AI Systems for Moral Status Using Self-Reports.
CoRR, 2023

Specific versus General Principles for Constitutional AI.
CoRR, 2023

Towards Understanding Sycophancy in Language Models.
CoRR, 2023

Studying Large Language Model Generalization with Influence Functions.
CoRR, 2023

Measuring Faithfulness in Chain-of-Thought Reasoning.
CoRR, 2023

Question Decomposition Improves the Faithfulness of Model-Generated Reasoning.
CoRR, 2023

Training Language Models with Language Feedback at Scale.
CoRR, 2023

Improving Code Generation by Training with Natural Language Feedback.
CoRR, 2023

The Capacity for Moral Self-Correction in Large Language Models.
CoRR, 2023

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Pretraining Language Models with Human Preferences.
Proceedings of the International Conference on Machine Learning, 2023


Few-shot Adaptation Works with UnpredicTable Data.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Finding and Fixing Undesirable Behaviors in Pretrained Language Models.
PhD thesis, 2022

Discovering Language Model Behaviors with Model-Written Evaluations.
CoRR, 2022

Constitutional AI: Harmlessness from AI Feedback.
CoRR, 2022

Measuring Progress on Scalable Oversight for Large Language Models.
CoRR, 2022

Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned.
CoRR, 2022

Language Models (Mostly) Know What They Know.
CoRR, 2022

Learning from Natural Language Feedback.
CoRR, 2022

Single-Turn Debate Does Not Help Humans Answer Hard Reading-Comprehension Questions.
CoRR, 2022

Red Teaming Language Models with Language Models.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

RL with KL penalties is better viewed as Bayesian inference.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

2021
True Few-Shot Learning with Language Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Rissanen Data Analysis: Examining Dataset Characteristics via Description Length.
Proceedings of the 38th International Conference on Machine Learning, 2021

Case-based Reasoning for Natural Language Queries over Knowledge Bases.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Unsupervised Question Decomposition for Question Answering.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

2019
Finding Generalizable Evidence by Learning to Convince Q&A Models.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

ELI5: Long Form Question Answering.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
HoME: a Household Multimodal Environment.
Proceedings of the 6th International Conference on Learning Representations, 2018

Visual Reasoning with Multi-hop Feature Modulation.
Proceedings of the Computer Vision - ECCV 2018, 2018

FiLM: Visual Reasoning with a General Conditioning Layer.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Learning Visual Reasoning Without Strong Priors.
CoRR, 2017

2016
Semi-Supervised Learning with the Deep Rendering Mixture Model.
CoRR, 2016


  Loading...