Debatable Intelligence: Benchmarking LLM Judges via Debate Speech Evaluation.
CoRR, June, 2025
An Analysis of Hyper-Parameter Optimization Methods for Retrieval Augmented Generation.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, May, 2025
WildIFEval: Instruction Following in the Wild.
CoRR, March, 2025
The Mighty ToRR: A Benchmark for Table Reasoning and Robustness.
,
,
,
,
,
,
,
,
,
,
CoRR, February, 2025
JuStRank: Benchmarking LLM Judges for System Ranking.
CoRR, 2024
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation.
CoRR, 2024
Efficient Benchmarking (of Language Models).
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations, 2024
Label-Efficient Model Selection for Text Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Active Learning for Natural Language Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours.
Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Zero-Shot Text Classification with Self-Training.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Cluster & Tune: Boost Cold Start Performance in Text Classification.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Active Learning for BERT: An Empirical Study.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Corpus Wide Argument Mining - A Working Solution.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Financial Event Extraction Using Wikipedia-Based Weak Supervision.
CoRR, 2019
A Dataset of General-Purpose Rebuttal.
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019
Argument Invention from First Principles.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019