2025
Humanity's Last Exam.
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
CoRR, January, 2025

Planning in Natural Language Improves LLM Search for Code Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024
Planning In Natural Language Improves LLM Search For Code Generation.
CoRR, 2024

LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet.
CoRR, 2024

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning.
CoRR, 2024

A Careful Examination of Large Language Model Performance on Grade School Arithmetic.
CoRR, 2024

Easy as ABCs: Unifying Boltzmann Q-Learning and Counterfactual Regret Minimization.
CoRR, 2024

A Careful Examination of Large Language Model Performance on Grade School Arithmetic.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Learning Goal-Conditioned Representations for Language Reward Models.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

2023
Chain-of-Thought Reasoning is a Policy Improvement Operator.
CoRR, 2023

No-regret Learning Dynamics for Sequential Correlated Equilibria.
Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, 2023

2022
A Simple Adaptive Procedure Converging to Forgiving Correlated Equilibria.
CoRR, 2022

Equilibrium Finding in Normal-Form Games via Greedy Regret Minimization.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2020
Trading Off Diversity and Quality in Natural Language Generation.
CoRR, 2020

2019
Unifying Human and Statistical Evaluation for Natural Language Generation.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019