2025

BEARCUBS: A benchmark for computer-using web agents.

[DOI]

Yixiao Song

Katherine Thai

CoRR, March, 2025

CLIPPER: Compression enables long-context synthetic data generation.

[DOI]

Chau Minh Pham

Yapei Chang

Mohit Iyyer

CoRR, February, 2025

2024

FABLES: Evaluating faithfulness and content selection in book-length summarization.

[DOI]

CoRR, 2024

BooookScore: A systematic exploration of book-length summarization in the era of LLMs.

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

PostMark: A Robust Blackbox Watermark for Large Language Models.

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

2022

RankGen: Improving Text Generation with Large Ranking Models.

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

RELiC: Retrieving Evidence for Literary Claims.

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022